On Mon, Jun 9, 2025 at 2:49 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Sun, Jun 8, 2025 at 7:52 PM Andrew Pinski <quic_apin...@quicinc.com> wrote: > > > > While thinking about how to implement the rest of the copy prop and makes > > sure not > > to introduce some compile time problems, optimize_agr_copyprop should be > > changed > > into a forwproping rather than looking backwards. > > Can you explain the issues when doing the backwards looking?
There is no direct problems with backwards looking. The problem comes if we want to say prop into a function call the original of the copy say: ``` a = b; f(a); ``` Backwards looking means you need to look back for all aggregate function call arguments. So you waste time always looking back. Instead of looking forwards only when there is a copy. we could also even do some proping like for: ``` a = LC0; _t = a[i]; ``` Which sometimes shows up. Here it does not matter if we load from LC0 or a for _t for most cores; though there might be a store forward penalty in some cases. Doing this prop even without fully removing the copy is better in the end. > > Note since this is a pure local "propagation" I still wonder whether > (now in the forward direction) > we want to use single_imm_use () instead of iterating over all uses of > the VDEF. Basically > we want to keep the optimization as "local" as possible. So for vops, single_imm_use is not true if we have say: ``` t = a; _2 = *b; c = t; ``` or: ``` t = a; if (c) goto BB2; else BB3; BB2: c = t; ... BB3: c = t; ``` > > I also wondered about > > b = a; > c = b; > d = b; > > we'll transform c = b into c = a and that's it. > > In the end I still hope we can leverage more global analysis, like > that of SRA, to decide > whether propagation is profitable or not, thus whether we can elide a > temporary. In > that regard, shouldn't we avoid the propagation if 'DEST' has its > address taken given > we certainly cannot elide it? I am not even sure it matters on avoiding the propagation because for large aggregates you will be getting an memcpy and it might be faster not from the values which were just stored to. In the case of small aggregates, the memcpy will be inlined using load/stores and the RTL level will get rid of the loads that just happen and you end up with one set of loads followed by 2 sets of stores. We are also removing some of the store forward penalty in many cases even if we can't elide the copy. Thanks, Andrew > > Thanks, > Richard. > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.cc (optimize_agr_copyprop): Change into a > > forward looking (looking at vdef's uses) instead of a back > > looking (vuse's def). > > > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> > > --- > > gcc/tree-ssa-forwprop.cc | 121 +++++++++++++++++++-------------------- > > 1 file changed, 60 insertions(+), 61 deletions(-) > > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > index 43b1c9d696f..2dc77ccba1d 100644 > > --- a/gcc/tree-ssa-forwprop.cc > > +++ b/gcc/tree-ssa-forwprop.cc > > @@ -1344,12 +1344,12 @@ optimize_memcpy_to_memset (gimple_stmt_iterator > > *gsip, tree dest, tree src, tree > > return true; > > } > > /* Optimizes > > - a = c; > > - b = a; > > + DEST = SRC; > > + DEST2 = DEST; # DEST2 = SRC2; > > into > > - a = c; > > - b = c; > > - GSIP is the second statement and SRC is the common > > + DEST = SRC; > > + DEST2 = SRC; > > + GSIP is the first statement and SRC is the common > > between the statements. > > */ > > static bool > > @@ -1365,65 +1365,64 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > > if (operand_equal_p (dest, src, 0)) > > return false; > > > > - tree vuse = gimple_vuse (stmt); > > - /* If the vuse is the default definition, then there is no store > > beforehand. */ > > - if (SSA_NAME_IS_DEFAULT_DEF (vuse)) > > - return false; > > - gimple *defstmt = SSA_NAME_DEF_STMT (vuse); > > - if (!gimple_assign_load_p (defstmt) > > - || !gimple_store_p (defstmt)) > > - return false; > > - if (gimple_has_volatile_ops (defstmt)) > > - return false; > > - > > - tree dest2 = gimple_assign_lhs (defstmt); > > - tree src2 = gimple_assign_rhs1 (defstmt); > > - > > - /* If the original store is `src2 = src2;` skip over it. */ > > - if (operand_equal_p (src2, dest2, 0)) > > - return false; > > - if (!operand_equal_p (src, dest2, 0)) > > - return false; > > - > > - > > - /* For 2 memory refences and using a temporary to do the copy, > > - don't remove the temporary as the 2 memory references might overlap. > > - Note t does not need to be decl as it could be field. > > - See PR 22237 for full details. > > - E.g. > > - t = *a; > > - *b = t; > > - Cannot be convert into > > - t = *a; > > - *b = *a; > > - Though the following is allowed to be done: > > - t = *a; > > - *a = t; > > - And convert it into: > > - t = *a; > > - *a = *a; > > - */ > > - if (!operand_equal_p (src2, dest, 0) > > - && !DECL_P (dest) && !DECL_P (src2)) > > - return false; > > - > > - if (dump_file && (dump_flags & TDF_DETAILS)) > > + tree vdef = gimple_vdef (stmt); > > + imm_use_iterator iter; > > + gimple *use_stmt; > > + bool changed = false; > > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, vdef) > > { > > - fprintf (dump_file, "Simplified\n "); > > - print_gimple_stmt (dump_file, stmt, 0, dump_flags); > > - fprintf (dump_file, "after previous\n "); > > - print_gimple_stmt (dump_file, defstmt, 0, dump_flags); > > - } > > - gimple_assign_set_rhs_from_tree (gsip, unshare_expr (src2)); > > - update_stmt (stmt); > > + if (!gimple_assign_load_p (use_stmt) > > + || !gimple_store_p (use_stmt)) > > + continue; > > + if (gimple_has_volatile_ops (use_stmt)) > > + continue; > > + tree dest2 = gimple_assign_lhs (use_stmt); > > + tree src2 = gimple_assign_rhs1 (use_stmt); > > + /* If the new store is `src2 = src2;` skip over it. */ > > + if (operand_equal_p (src2, dest2, 0)) > > + continue; > > + if (!operand_equal_p (dest, src2, 0)) > > + continue; > > + /* For 2 memory refences and using a temporary to do the copy, > > + don't remove the temporary as the 2 memory references might > > overlap. > > + Note t does not need to be decl as it could be field. > > + See PR 22237 for full details. > > + E.g. > > + t = *a; #DEST = SRC; > > + *b = t; #DEST2 = SRC2; > > + Cannot be convert into > > + t = *a; > > + *b = *a; > > + Though the following is allowed to be done: > > + t = *a; > > + *a = t; > > + And convert it into: > > + t = *a; > > + *a = *a; > > + */ > > + if (!operand_equal_p (dest2, src, 0) > > + && !DECL_P (dest2) && !DECL_P (src)) > > + continue; > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + { > > + fprintf (dump_file, "Simplified\n "); > > + print_gimple_stmt (dump_file, use_stmt, 0, dump_flags); > > + fprintf (dump_file, "after previous\n "); > > + print_gimple_stmt (dump_file, stmt, 0, dump_flags); > > + } > > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > > + gimple_assign_set_rhs_from_tree (&gsi, unshare_expr (src)); > > + update_stmt (use_stmt); > > > > - if (dump_file && (dump_flags & TDF_DETAILS)) > > - { > > - fprintf (dump_file, "into\n "); > > - print_gimple_stmt (dump_file, stmt, 0, dump_flags); > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + { > > + fprintf (dump_file, "into\n "); > > + print_gimple_stmt (dump_file, use_stmt, 0, dump_flags); > > + } > > + statistics_counter_event (cfun, "copy prop for aggregate", 1); > > + changed = true; > > } > > - statistics_counter_event (cfun, "copy prop for aggregate", 1); > > - return true; > > + return changed; > > } > > > > /* *GSI_P is a GIMPLE_CALL to a builtin function. > > -- > > 2.43.0 > >