Hi,

On Wed, Jun 29, 2011 at 02:52:36PM +0200, Richard Guenther wrote:
> On Tue, 28 Jun 2011, Martin Jambor wrote:
> > On Tue, Jun 28, 2011 at 03:01:17PM +0200, Richard Guenther wrote:
> > > On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor <mjam...@suse.cz> wrote:
> > > > Hi,
> > > >
> > > > at the moment SRA can get confused by alignment padding and think that
> > > > it actually contains some data for which there is no planned
> > > > replacement and thus might leave some loads and stores in place
> > > > instead of removing them.  This is perhaps the biggest problem when we
> > > > attempt total scalarization of simple structures exactly in order to
> > > > get rid of these and of the variables altogether.
> > > >
> > > > I've pondered for quite a while how to best deal with them.  One
> > > > option was to make just the total scalarization stronger.  I have also
> > > > contemplated creating phantom accesses for padding I could detect
> > > > (i.e. in simple structures) which would be more general, but this
> > > > would complicate the parts of SRA which are already quite convoluted
> > > > and I was not really sure it was worth it.
> > > >
> > > > Eventually I decided for the total scalarization option.  This patch
> > > > changes it such that the flag is propagated down the access tree but
> > > > also, if it does not work out, is reset on the way up.  If the flag
> > > > survives, the access tree is considered "covered" by scalar
> > > > replacements and thus it is known not to contain unscalarized data.
> > > >
> > > > While changing function analyze_access_subtree I have simplified the
> > > > way we compute the hole flag and also fixed one comparison which we
> > > > currently have the wrong way round but it fortunately does not matter
> > > > because if there is a hole, the covered_to will never add up to the
> > > > total size.  I'll probably post a separate patch against 4.6 just in
> > > > case someone attempts to read the source.
> > > >
> > > > Bootstrapped and tested on x86_64-linux, OK for trunk?
> > > 
> > > So, what will it do for the testcase?
> > > 
> > > The following is what I _think_ it should do:
> > > 
> > > <bb 2>:
> > >   l = *p_1(D);
> > >   l$i_6 = p_1(D)->i;
> > >   D.2700_2 = l$i_6;
> > >   D.2701_3 = D.2700_2 + 1;
> > >   l$i_12 = D.2701_3;
> > >   *p_1(D) = l;
> > >   p_1(D)->i = l$i_12;
> > > 
> > > and let FRE/DSE do their job (which they don't do, unfortunately).
> > > So does your patch then remove the load/store from/to l but keep
> > > the elementwise loads/stores (which are probably cleaned up by FRE)?
> > > 
> > 
> > Well, that is what would happen if no total scalarization was going
> > on.  Total scalarization is a poor-man's aggregate copy-propagation by
> > splitting up small structures to individual fields whenever we can get
> > rid of them this way (i.e. if they are never used in a non-assignment)
> > which I introduced to fix PR 42585 - but unfortunately the padding
> > problem did not occur to me until this winter.
> > 
> > Currently, SRA performs very badly on the testcase, creating:
> > 
> > <bb 2>:
> >   l = *p_1(D);
> >   l$i_6 = p_1(D)->i;
> >   l$f1_8 = p_1(D)->f1;
> >   l$f2_9 = p_1(D)->f2;
> >   l$f3_10 = p_1(D)->f3;
> >   l$f4_11 = p_1(D)->f4;
> >   D.1966_2 = l$i_6;
> >   D.1967_3 = D.1966_2 + 1;
> >   l$i_12 = D.1967_3;
> >   *p_1(D) = l;              <-- this should not be here
> >   p_1(D)->i = l$i_12;
> >   p_1(D)->f1 = l$f1_8;
> >   p_1(D)->f2 = l$f2_9;
> >   p_1(D)->f3 = l$f3_10;
> >   p_1(D)->f4 = l$f4_11;
> >   return;
> > 
> > Unfortunately, this basically survives all the way to the "optimized"
> > dump.  With the patch, the assignment *p_1(D) = l; is removed and
> > copyprop1 and cddce1 turn this into:
> > 
> > <bb 2>:
> >   l$i_6 = p_1(D)->i;
> >   D.1967_3 = l$i_6 + 1;
> >   p_1(D)->i = D.1967_3;
> >   return;
> > 
> > which is then the "optimized" gimple, already before IPA and at -O1.
> 
> Ok, that's certainly better.  Can you file a bugreport for the FRE/DSE
> issue please?
> 
> > For the record, without total scalarization, the "optimized" gimple
> > would be:
> > 
> > <bb 2>:
> >   l = *p_1(D);
> >   l$i_6 = p_1(D)->i;
> >   D.1967_3 = l$i_6 + 1;
> >   *p_1(D) = l;
> >   p_1(D)->i = D.1967_3;
> >   return;
> > 
> > So at the moment FRE/DSE certainly does not help.  Eventually we
> > should do something like that or a real aggregate copy propagation but
> > until then we probably need to live with the total scalarization
> > thingy - I have learned in the PR mentioned above and a few others,
> > there are people who really want at least this functionality now - and
> > it should not perform this badly on unaligned structures.
> 
> The only thing I'm worried about is
> 
> struct X { int i; short s; pad p; int j; };
> struct Y { int i; short s; int j; };
> 
> int main()
> {
>   struct X x = { 1, 2, 3, 4 };
>   struct Y y = *(struct Y)&x;
>   y.s++;
>   *(struct Y)&x = y;
>   if (x->pad != 3)
>     abort ();
>   return 0;
> }
> 
> untested, but any variant needs to make sure the padding value is not
> lost by means of assigning x to y and back.  The above should translate
> to y = MEM[&x + 0]; ... MEM[&x + 0] = y;
> 
> So if the result for sth like the above looks ok the patch is ok.

It splits the copies to series of piecemeal scalar ones and the whole
testcase is optimized to return zero soon afterwards.  I have tried to
imagine more intricate cases, even on and beyond the verge of
definedness, and always easily convinced myself that they will not do
anything unexpected (if SRA does not give up, which it easily will if
scalar accesses, those created by total scalarization and all others,
overlap in any way).

The PR about FRE/DSE not being able to handle this is PR 49599 and I
have just committed the proposed patch.

Thanks,

Martin


> 
> Thanks,
> Richard.
> 
> > Martin
> > 
> > 
> > 
> > 
> > > Richard.
> > > 
> > > 
> > > > Thanks,
> > > >
> > > > Martin
> > > >
> > > >
> > > > 2011-06-24  Martin Jambor  <mjam...@suse.cz>
> > > >
> > > >        * tree-sra.c (struct access): Rename total_scalarization to
> > > >        grp_total_scalarization
> > > >        (completely_scalarize_var): New function.
> > > >        (sort_and_splice_var_accesses): Set total_scalarization in the
> > > >        representative access.
> > > >        (analyze_access_subtree): Propagate total scalarization accross 
> > > > the
> > > >        tree, no holes in totally scalarized trees, simplify coverage
> > > >        computation.
> > > >        (analyze_all_variable_accesses): Call completely_scalarize_var 
> > > > instead
> > > >        of completely_scalarize_record.
> > > >
> > > >        * testsuite/gcc.dg/tree-ssa/sra-12.c: New test.
> > > >
> > > > Index: src/gcc/tree-sra.c
> > > > ===================================================================
> > > > *** src.orig/gcc/tree-sra.c
> > > > --- src/gcc/tree-sra.c
> > > > *************** struct access
> > > > *** 170,179 ****
> > > >    /* Is this particular access write access? */
> > > >    unsigned write : 1;
> > > >
> > > > -   /* Is this access an artificial one created to scalarize some record
> > > > -      entirely? */
> > > > -   unsigned total_scalarization : 1;
> > > > -
> > > >    /* Is this access an access to a non-addressable field? */
> > > >    unsigned non_addressable : 1;
> > > >
> > > > --- 170,175 ----
> > > > *************** struct access
> > > > *** 204,209 ****
> > > > --- 200,209 ----
> > > >       is not propagated in the access tree in any direction.  */
> > > >    unsigned grp_scalar_write : 1;
> > > >
> > > > +   /* Is this access an artificial one created to scalarize some record
> > > > +      entirely? */
> > > > +   unsigned grp_total_scalarization : 1;
> > > > +
> > > >    /* Other passes of the analysis use this bit to make function
> > > >       analyze_access_subtree create scalar replacements for this group 
> > > > if
> > > >       possible.  */
> > > > *************** dump_access (FILE *f, struct access *acc
> > > > *** 377,402 ****
> > > >    fprintf (f, ", type = ");
> > > >    print_generic_expr (f, access->type, 0);
> > > >    if (grp)
> > > > !     fprintf (f, ", total_scalarization = %d, grp_read = %d, grp_write 
> > > > = %d, "
> > > > !            "grp_assignment_read = %d, grp_assignment_write = %d, "
> > > > !            "grp_scalar_read = %d, grp_scalar_write = %d, "
> > > >             "grp_hint = %d, grp_covered = %d, "
> > > >             "grp_unscalarizable_region = %d, grp_unscalarized_data = 
> > > > %d, "
> > > >             "grp_partial_lhs = %d, grp_to_be_replaced = %d, "
> > > >             "grp_maybe_modified = %d, "
> > > >             "grp_not_necessarilly_dereferenced = %d\n",
> > > > !            access->total_scalarization, access->grp_read, 
> > > > access->grp_write,
> > > > !            access->grp_assignment_read, access->grp_assignment_write,
> > > > !            access->grp_scalar_read, access->grp_scalar_write,
> > > >             access->grp_hint, access->grp_covered,
> > > >             access->grp_unscalarizable_region, 
> > > > access->grp_unscalarized_data,
> > > >             access->grp_partial_lhs, access->grp_to_be_replaced,
> > > >             access->grp_maybe_modified,
> > > >             access->grp_not_necessarilly_dereferenced);
> > > >    else
> > > > !     fprintf (f, ", write = %d, total_scalarization = %d, "
> > > >             "grp_partial_lhs = %d\n",
> > > > !            access->write, access->total_scalarization,
> > > >             access->grp_partial_lhs);
> > > >  }
> > > >
> > > > --- 377,402 ----
> > > >    fprintf (f, ", type = ");
> > > >    print_generic_expr (f, access->type, 0);
> > > >    if (grp)
> > > > !     fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read 
> > > > = %d, "
> > > > !            "grp_assignment_write = %d, grp_scalar_read = %d, "
> > > > !            "grp_scalar_write = %d, grp_total_scalarization = %d, "
> > > >             "grp_hint = %d, grp_covered = %d, "
> > > >             "grp_unscalarizable_region = %d, grp_unscalarized_data = 
> > > > %d, "
> > > >             "grp_partial_lhs = %d, grp_to_be_replaced = %d, "
> > > >             "grp_maybe_modified = %d, "
> > > >             "grp_not_necessarilly_dereferenced = %d\n",
> > > > !            access->grp_read, access->grp_write, 
> > > > access->grp_assignment_read,
> > > > !            access->grp_assignment_write, access->grp_scalar_read,
> > > > !            access->grp_scalar_write, access->grp_total_scalarization,
> > > >             access->grp_hint, access->grp_covered,
> > > >             access->grp_unscalarizable_region, 
> > > > access->grp_unscalarized_data,
> > > >             access->grp_partial_lhs, access->grp_to_be_replaced,
> > > >             access->grp_maybe_modified,
> > > >             access->grp_not_necessarilly_dereferenced);
> > > >    else
> > > > !     fprintf (f, ", write = %d, grp_total_scalarization = %d, "
> > > >             "grp_partial_lhs = %d\n",
> > > > !            access->write, access->grp_total_scalarization,
> > > >             access->grp_partial_lhs);
> > > >  }
> > > >
> > > > *************** completely_scalarize_record (tree base,
> > > > *** 924,930 ****
> > > >            access = create_access_1 (base, pos, size);
> > > >            access->expr = nref;
> > > >            access->type = ft;
> > > > !           access->total_scalarization = 1;
> > > >            /* Accesses for intraprocedural SRA can have their stmt 
> > > > NULL.  */
> > > >          }
> > > >        else
> > > > --- 924,930 ----
> > > >            access = create_access_1 (base, pos, size);
> > > >            access->expr = nref;
> > > >            access->type = ft;
> > > > !           access->grp_total_scalarization = 1;
> > > >            /* Accesses for intraprocedural SRA can have their stmt 
> > > > NULL.  */
> > > >          }
> > > >        else
> > > > *************** completely_scalarize_record (tree base,
> > > > *** 932,937 ****
> > > > --- 932,954 ----
> > > >        }
> > > >  }
> > > >
> > > > + /* Create total_scalarization accesses for all scalar type fields in 
> > > > VAR and
> > > > +    for VAR a a whole.  VAR must be of a RECORD_TYPE conforming to
> > > > +    type_consists_of_records_p.   */
> > > > +
> > > > + static void
> > > > + completely_scalarize_var (tree var)
> > > > + {
> > > > +   HOST_WIDE_INT size = tree_low_cst (DECL_SIZE (var), 1);
> > > > +   struct access *access;
> > > > +
> > > > +   access = create_access_1 (var, 0, size);
> > > > +   access->expr = var;
> > > > +   access->type = TREE_TYPE (var);
> > > > +   access->grp_total_scalarization = 1;
> > > > +
> > > > +   completely_scalarize_record (var, var, 0, var);
> > > > + }
> > > >
> > > >  /* Search the given tree for a declaration by skipping handled 
> > > > components and
> > > >     exclude it from the candidates.  */
> > > > *************** sort_and_splice_var_accesses (tree var)
> > > > *** 1713,1719 ****
> > > >        bool grp_assignment_read = access->grp_assignment_read;
> > > >        bool grp_assignment_write = access->grp_assignment_write;
> > > >        bool multiple_scalar_reads = false;
> > > > !       bool total_scalarization = access->total_scalarization;
> > > >        bool grp_partial_lhs = access->grp_partial_lhs;
> > > >        bool first_scalar = is_gimple_reg_type (access->type);
> > > >        bool unscalarizable_region = access->grp_unscalarizable_region;
> > > > --- 1730,1736 ----
> > > >        bool grp_assignment_read = access->grp_assignment_read;
> > > >        bool grp_assignment_write = access->grp_assignment_write;
> > > >        bool multiple_scalar_reads = false;
> > > > !       bool total_scalarization = access->grp_total_scalarization;
> > > >        bool grp_partial_lhs = access->grp_partial_lhs;
> > > >        bool first_scalar = is_gimple_reg_type (access->type);
> > > >        bool unscalarizable_region = access->grp_unscalarizable_region;
> > > > *************** sort_and_splice_var_accesses (tree var)
> > > > *** 1757,1763 ****
> > > >          grp_assignment_write |= ac2->grp_assignment_write;
> > > >          grp_partial_lhs |= ac2->grp_partial_lhs;
> > > >          unscalarizable_region |= ac2->grp_unscalarizable_region;
> > > > !         total_scalarization |= ac2->total_scalarization;
> > > >          relink_to_new_repr (access, ac2);
> > > >
> > > >          /* If there are both aggregate-type and scalar-type accesses 
> > > > with
> > > > --- 1774,1780 ----
> > > >          grp_assignment_write |= ac2->grp_assignment_write;
> > > >          grp_partial_lhs |= ac2->grp_partial_lhs;
> > > >          unscalarizable_region |= ac2->grp_unscalarizable_region;
> > > > !         total_scalarization |= ac2->grp_total_scalarization;
> > > >          relink_to_new_repr (access, ac2);
> > > >
> > > >          /* If there are both aggregate-type and scalar-type accesses 
> > > > with
> > > > *************** sort_and_splice_var_accesses (tree var)
> > > > *** 1778,1783 ****
> > > > --- 1795,1801 ----
> > > >        access->grp_assignment_read = grp_assignment_read;
> > > >        access->grp_assignment_write = grp_assignment_write;
> > > >        access->grp_hint = multiple_scalar_reads || total_scalarization;
> > > > +       access->grp_total_scalarization = total_scalarization;
> > > >        access->grp_partial_lhs = grp_partial_lhs;
> > > >        access->grp_unscalarizable_region = unscalarizable_region;
> > > >        if (access->first_link)
> > > > *************** analyze_access_subtree (struct access *r
> > > > *** 2023,2028 ****
> > > > --- 2041,2048 ----
> > > >        root->grp_write = 1;
> > > >        if (parent->grp_assignment_write)
> > > >        root->grp_assignment_write = 1;
> > > > +       if (parent->grp_total_scalarization)
> > > > +       root->grp_total_scalarization = 1;
> > > >      }
> > > >
> > > >    if (root->grp_unscalarizable_region)
> > > > *************** analyze_access_subtree (struct access *r
> > > > *** 2033,2048 ****
> > > >
> > > >    for (child = root->first_child; child; child = child->next_sibling)
> > > >      {
> > > > !       if (!hole && child->offset < covered_to)
> > > > !       hole = true;
> > > > !       else
> > > > !       covered_to += child->size;
> > > > !
> > > >        sth_created |= analyze_access_subtree (child, root,
> > > >                                             allow_replacements && 
> > > > !scalar);
> > > >
> > > >        root->grp_unscalarized_data |= child->grp_unscalarized_data;
> > > > !       hole |= !child->grp_covered;
> > > >      }
> > > >
> > > >    if (allow_replacements && scalar && !root->first_child
> > > > --- 2053,2068 ----
> > > >
> > > >    for (child = root->first_child; child; child = child->next_sibling)
> > > >      {
> > > > !       hole |= covered_to < child->offset;
> > > >        sth_created |= analyze_access_subtree (child, root,
> > > >                                             allow_replacements && 
> > > > !scalar);
> > > >
> > > >        root->grp_unscalarized_data |= child->grp_unscalarized_data;
> > > > !       root->grp_total_scalarization &= child->grp_total_scalarization;
> > > > !       if (child->grp_covered)
> > > > !       covered_to += child->size;
> > > > !       else
> > > > !       hole = true;
> > > >      }
> > > >
> > > >    if (allow_replacements && scalar && !root->first_child
> > > > *************** analyze_access_subtree (struct access *r
> > > > *** 2063,2072 ****
> > > >        sth_created = true;
> > > >        hole = false;
> > > >      }
> > > > !   else if (covered_to < limit)
> > > > !     hole = true;
> > > >
> > > > !   if (sth_created && !hole)
> > > >      {
> > > >        root->grp_covered = 1;
> > > >        return true;
> > > > --- 2083,2098 ----
> > > >        sth_created = true;
> > > >        hole = false;
> > > >      }
> > > > !   else
> > > > !     {
> > > > !       if (covered_to < limit)
> > > > !       hole = true;
> > > > !       if (scalar)
> > > > !       root->grp_total_scalarization = 0;
> > > > !     }
> > > >
> > > > !   if (sth_created
> > > > !       && (!hole || root->grp_total_scalarization))
> > > >      {
> > > >        root->grp_covered = 1;
> > > >        return true;
> > > > *************** analyze_all_variable_accesses (void)
> > > > *** 2288,2294 ****
> > > >                <= max_total_scalarization_size)
> > > >            && type_consists_of_records_p (TREE_TYPE (var)))
> > > >          {
> > > > !           completely_scalarize_record (var, var, 0, var);
> > > >            if (dump_file && (dump_flags & TDF_DETAILS))
> > > >              {
> > > >                fprintf (dump_file, "Will attempt to totally scalarize 
> > > > ");
> > > > --- 2314,2320 ----
> > > >                <= max_total_scalarization_size)
> > > >            && type_consists_of_records_p (TREE_TYPE (var)))
> > > >          {
> > > > !           completely_scalarize_var (var);
> > > >            if (dump_file && (dump_flags & TDF_DETAILS))
> > > >              {
> > > >                fprintf (dump_file, "Will attempt to totally scalarize 
> > > > ");
> > > > Index: src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c
> > > > ===================================================================
> > > > *** /dev/null
> > > > --- src/gcc/testsuite/gcc.dg/tree-ssa/sra-12.c
> > > > ***************
> > > > *** 0 ****
> > > > --- 1,25 ----
> > > > + /* Verify that SRA total scalarization will not be confused by 
> > > > padding.  */
> > > > +
> > > > + /* { dg-do compile } */
> > > > + /* { dg-options "-O1 -fdump-tree-release_ssa" } */
> > > > +
> > > > + struct S
> > > > + {
> > > > +   int i;
> > > > +   unsigned short f1;
> > > > +   char f2;
> > > > +   unsigned short f3, f4;
> > > > + };
> > > > +
> > > > +
> > > > + int foo (struct S *p)
> > > > + {
> > > > +   struct S l;
> > > > +
> > > > +   l = *p;
> > > > +   l.i++;
> > > > +   *p = l;
> > > > + }
> > > > +
> > > > + /* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa"} } */
> > > > + /* { dg-final { cleanup-tree-dump "release_ssa" } } */
> > > >
> > > >
> > 

Reply via email to