Re: [C++0x] contiguous bitfields race implementation

Richard Guenther Mon, 29 Aug 2011 04:49:39 -0700

On Fri, Aug 26, 2011 at 8:54 PM, Aldy Hernandez <al...@redhat.com> wrote:
> This is a "slight" update from the last revision, with your issues addressed
> as I explained in the last email.  However, everything turned out to be much
> tricker than I expected (variable length offsets with arrays, bit fields
> spanning multiple words, surprising padding gymnastics by GCC, etc etc).
>
> It turns out that what we need is to know the precise bit region size at all
> times, and adjust it as we rearrange and cut things into pieces throughout
> the RTL bit field machinery.
>
> I enabled the C++ memory model, and forced a boostrap and regression test
> with it.  This brought about many interesting cases, which I was able to
> distill and add to the testsuite.
>
> Of particular interest was the struct-layout-1.exp tests.  Since many of the
> tests set a global bit field, only to later check it against a local
> variable containing the same value, it is the perfect stressor because,
> while globals are restricted under the memory model, locals are not.  So we
> can check that we can interoperate with the less restrictive model, and that
> the patch does not introduce ABI inconsistencies.  After much grief, we are
> now passing all the struct-layout-1.exp tests. Eventually, I'd like to force
> the struct-layout-1.exp tests to run for "--param allow-store-data-races=0"
> as well.  Unfortunately, this will increase testing time.
>
> I have (unfortunately) introduced an additional call to
> get_inner_reference(), but only for the field itself (one time).  I can't
> remember the details, but it was something to effect of the bit position +
> padding being impossible to calculate in one variable array reference case.
>  I can dig up the case if you'd like.
>
> I am currently tackling a reload miscompilation failure while building a
> 32-bit library.  I am secretly hoping your review will uncover the flaw
> without me having to pick this up.  Otherwise, this is a much more
> comprehensive approach than what is currently in mainline, and we now pass
> all the bitfield tests the GCC testsuite could throw at it.
>
> Fire away.


+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
+    {
+      *byte_offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
+      return;
+    }

shouldn't this be at the very beginning of the function?  Because
you've set *bit_offset to an offset that was _not_ calculated relative
to TREE_OPERAND (exp, 2).  And you'll avoid ICEing

+         /* bitoff = start_byte * 8 - (fld.byteoff * 8 + fld.bitoff) */
+         t = fold_build2 (MINUS_EXPR, size_type_node,
+                          fold_build2 (PLUS_EXPR, size_type_node,
+                                       fold_build2 (MULT_EXPR, size_type_node,
+                                                    toffset, bits),
+                                       build_int_cst (integer_type_node,
+                                                      tbitpos)),
+                          fold_build2 (MULT_EXPR, size_type_node,
+                                       *byte_offset, bits));
+
+         *bit_offset = tree_low_cst (t, 1);

here in case t isn't an INTEGER_CST.  The comment before the
tree formula above doesn't match it, please update it.  If
*bit_offset is supposed to be relative to *byte_offset then it should
be easy to calculate it without another get_inner_reference.

Btw, *byte_offset is still not relative to the containing object as
documented, but relative to the base object of the exp reference
tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
to be relative to a.i.j.k get_inner_reference would be not needed
either.  Can you clarify what "containing object" means in the
overall comment please?

If it is really relative to the innermost reference of exp you can
"CSE" the offset of TREE_OPERAND (exp, 0) and do relative
adjustments for all the other get_inner_reference calls.  For
example the

+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
...
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+                                  TREE_TYPE (exp),
+                                  TREE_OPERAND (exp, 0),
+                                  fld, NULL_TREE),
+                          &tbitsize, &end_bitpos, &end_offset,
+                          &tmode, &tunsignedp, &tvolatilep, true);

case is not correct anyway, fld may have variable position
(non-INTEGER_CST DECL_FIELD_OFFSET), you can't
assume

+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);

this thus.

+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-        at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+       - TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start));
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }

so you wasn't convinced about my worries about tail-padding re-use?
And you blindly assume a constant-size record_type ...
and you don't account for DECL_FIELD_OFFSET of bitregion_start
(shouldn't you simply use (and compute) a byte_offset relative to
the start of the record)?  Well, I still think you cannot incoude the
padding at the end of the structure (if TREE_OPERAND (exp, 0) is
a COMPONENT_REF as well then its DECL_SIZE can be different
than it's TYPE_SIZE).

+             bitregion_byte_offset = fold_build2 (MINUS_EXPR, 
integer_type_node,
+                                             bitregion_byte_offset,
+                                             build_int_cst (integer_type_node,
+                                                            bitpos / 
BITS_PER_UNIT));

general remark - you should be using sizetype for byte offsets,
bitsizetype for bit offset trees and size_binop for computations, instead
of fold_build2 (applies everywhere).  And thus pass size_zero_node
to store_field bitregion_byte_offset.

Can you split out the get_best_mode two param removal pieces?  Consider
them pre-approved.

Why do you need to adjust store_bit_field with the extra param - can't
you simply pass an adjusted str_rtx from the single caller that can
have that non-zero?

Thanks,
Richard.

Re: [C++0x] contiguous bitfields race implementation

Reply via email to