On Tue, Oct 28, 2025 at 1:20 AM Richard Biener <[email protected]> wrote: > > On Mon, Oct 27, 2025 at 9:41 PM Andrew Pinski > <[email protected]> wrote: > > > > On Mon, Oct 27, 2025 at 1:07 PM Andrew Pinski > > <[email protected]> wrote: > > > > > > On Mon, Oct 27, 2025 at 5:56 AM Richard Biener > > > <[email protected]> wrote: > > > > > > > > On Mon, Oct 27, 2025 at 10:22 AM Andrew Pinski > > > > <[email protected]> wrote: > > > > > > > > > > On Mon, Oct 27, 2025 at 1:50 AM Richard Biener > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Sat, Oct 25, 2025 at 6:52 AM Andrew Pinski > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > After r16-4081-g966cdec2b2 which added folding of > > > > > > > __builtin_assume_aligned, > > > > > > > forwprop would propagate pointers that lower alignment replacing > > > > > > > ones with > > > > > > > greater alignment. This causes us to lose alignment information > > > > > > > that > > > > > > > __builtin_assume_aligned provided to expand. Normally this just > > > > > > > loses some > > > > > > > optimizations except in the s390 case where the alignment is > > > > > > > specifically > > > > > > > checked and was for inlining of the atomics; without this patch > > > > > > > an infininite > > > > > > > loop would happen. > > > > > > > > > > > > > > Note this was previously broken for -Og before > > > > > > > r16-4081-g966cdec2b2. This > > > > > > > fixes -Og case as forwprop is used instead of copyprop. > > > > > > > > > > > > > > This moves the testcase for pr107389.c to torture to get a > > > > > > > generic testcase. > > > > > > > pr107389.c was originally for -O0 case but we should test for > > > > > > > other > > > > > > > optimization levels so this is not lost again. > > > > > > > > > > > > > > align-5.c is xfailed because __builtin_assume_aligned is not > > > > > > > instrumented for ubsan > > > > > > > alignment and ubsan check to see pointer is aligned before > > > > > > > emitting a check for the > > > > > > > load (based on the known alignment in compiling). See PR 122038 > > > > > > > too. I had mentioned > > > > > > > this issue previously in r16-4081-g966cdec2b2 too. > > > > > > > > > > > > So what made you not chose to make use of > > > > > > maybe_duplicate_ssa_info_at_copy > > > > > > in forwprop like copyprop does? > > > > > > > > > > Because maybe_duplicate_ssa_info_at_copy is already used in forwprop: > > > > > ``` > > > > > static void > > > > > fwprop_set_lattice_val (tree name, tree val) > > > > > { > > > > > if (TREE_CODE (name) == SSA_NAME) > > > > > { > > > > > if (SSA_NAME_VERSION (name) >= lattice.length ()) > > > > > { > > > > > lattice.reserve (num_ssa_names - lattice.length ()); > > > > > lattice.quick_grow_cleared (num_ssa_names); > > > > > } > > > > > lattice[SSA_NAME_VERSION (name)] = val; > > > > > /* As this now constitutes a copy duplicate points-to > > > > > and range info appropriately. */ > > > > > if (TREE_CODE (val) == SSA_NAME) > > > > > maybe_duplicate_ssa_info_at_copy (name, val); > > > > > } > > > > > } > > > > > ``` > > > > > and it is not working for arguments as the bb on argument's > > > > > SSA_NAME_DEF_STMT is NULL. > > > > > > > > > > Also it would not work for the s390 case where the > > > > > __builtin_assume_aligned is in a different bb. > > > > > The s390 case which was: > > > > > ``` > > > > > if (((intptr_t)a & 0xf)==0) > > > > > { > > > > > __atomic_load(__builtin_assume_aligned(a, 16)) > > > > > } > > > > > ``` > > > > > (yes this one should actually be handled differently by ranger at > > > > > expansion but that is NOT hooked up yet). > > > > > > > > Hmm, but how's that a regression from copyprop then? Or is it not? > > > > > > Let me step back a second. > > > Before moving __builtin_assume_aligned folding over forwprop; it was > > > part of fab. There was no copyprop after fab (except for -Og; will get > > > back to this in a few). > > > So at expand time the ssa name for the __atomic_*/memcpy/memset > > > functions would have the alignment that is specified by the > > > __builtin_assume_aligned. > > > Now after the moving of __baa to forwprop, the alignment information > > > would be lost due to the copy prop. > > > > > > > > > > > > > > As for the specific case of __atomic_load the fix is of course to > > > > put alignment info onto the load itself and not rely on (possibly > > > > context sensitive) alignment info on the SSA pointer. > > > > > > I agree but it is not just __atomic_* functions but also memset, > > > memcpy, memmove would also need this on each ptr argument. Getting a > > > good design here is something I suspect will take a few weeks so I > > > wanted to get a stop gap for GCC 16 (yes I know we have too many of > > > those before). > > > > > > > > > > > I believe all those "late" alignment info things can be categorized into > > > > the info being not in the proper place (where RTL gets this correct). > > > > > > Yes I agree. I filed > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122449 to go back over > > > and figure out a decent way of fixing this. I don't have a full design > > > at this stage and I am not sure I will be able to work on it until > > > December. > > One of the more "ugly" ways would be to handle it similar to how we have > WITH_SIZE_EXPR around aggregate arguments sometimes, have a > WITH_ALIGNMENT wrapping SSA pointer arguments. It's of course likely > that this kind of wrapping would cause too much fallout. Another possibility > would be to extend the pointer info we have on gcalls - we have > two pt_solutions in them already, we could as well put in a tree pointer > to a pointer typed INTEGER_CST specifying both TBAA info and alignment > in case the call represents a pure memory load or store (not good enough for > memcpy of course, which would need two). Bonus points for eventually > offloading all this to optional on-the-side (GCed, so shareable) info > to make the default gcall smaller. > And then there's the IFN "solution", but IMO that's ugly for memcpy, but I > think that the atomic builtins should have been IFNs from the start. > > > I forgot to say something about the -Og situation. So -Og has been > > broken since r0-124039-g8a845901e41653 which moved pass_fold_builtins > > before pass_copy_prop; 4.9.0/4.8.2. I am not sure -Og is used enough > > for people to have noticed. It is definitely not used at all to > > compile libatomic which is where the s390 issue comes from. > > This is the other reason why I just trying to do a stop gap for GCC 16 > > was because -Og was broken for over 12 years and nobody noticed it. > > I see. I wonder whether it would be better to leave __builtin_assume_aligned > around then, because that inherently introduces the copy and it would show > why. > TER / SSA coalescing might make a mess our of the copies you leave in place > anyway, no?
After thinking for a day, I agree. I will submit a patch to remove __builtin_assume_aligned folding and revert my other patch. I also had looked into the history the folding in fab for this and it was added when the builtin was added and there is no mention of why that part was needed as far as I could tell too. Thanks, Andrew Pinski > > Richard. > > > Thanks, > > Andrew > > > > > > > > Thanks, > > > Andrew > > > > > > > > > > > Richard. > > > > > > > > > > > > > > Thanks, > > > > > Andrew Pinski > > > > > > > > > > > > > > > > > > PR middle-end/107389 > > > > > > > PR tree-optimization/122086 > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > * tree-ssa-forwprop.cc (forwprop_may_propagate_copy): New > > > > > > > function. > > > > > > > (pass_forwprop::execute): Use forwprop_may_propagate_copy > > > > > > > instead of may_propagate_copy. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.dg/pr107389.c: Move to... > > > > > > > * gcc.dg/torture/pr107389.c: ...here. Skip for lto. > > > > > > > Use dg-additional-options rather than dg-options. > > > > > > > * c-c++-common/ubsan/align-5.c: xfail. > > > > > > > > > > > > > > Signed-off-by: Andrew Pinski <[email protected]> > > > > > > > --- > > > > > > > gcc/testsuite/c-c++-common/ubsan/align-5.c | 4 ++- > > > > > > > gcc/testsuite/gcc.dg/{ => torture}/pr107389.c | 3 +- > > > > > > > gcc/tree-ssa-forwprop.cc | 32 > > > > > > > ++++++++++++++++--- > > > > > > > 3 files changed, 33 insertions(+), 6 deletions(-) > > > > > > > rename gcc/testsuite/gcc.dg/{ => torture}/pr107389.c (73%) > > > > > > > > > > > > > > diff --git a/gcc/testsuite/c-c++-common/ubsan/align-5.c > > > > > > > b/gcc/testsuite/c-c++-common/ubsan/align-5.c > > > > > > > index 484790134a6..6d2ac26612b 100644 > > > > > > > --- a/gcc/testsuite/c-c++-common/ubsan/align-5.c > > > > > > > +++ b/gcc/testsuite/c-c++-common/ubsan/align-5.c > > > > > > > @@ -11,4 +11,6 @@ foo (char *p) > > > > > > > return *q; > > > > > > > } > > > > > > > > > > > > > > -/* { dg-final { scan-assembler "__ubsan_handle" } } */ > > > > > > > +/* xfail, see PR 122038 as __builtin_assume_aligned should be > > > > > > > instrumented instead > > > > > > > + of only the load. */ > > > > > > > +/* { dg-final { scan-assembler "__ubsan_handle" { xfail *-*-* } > > > > > > > } } */ > > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr107389.c > > > > > > > b/gcc/testsuite/gcc.dg/torture/pr107389.c > > > > > > > similarity index 73% > > > > > > > rename from gcc/testsuite/gcc.dg/pr107389.c > > > > > > > rename to gcc/testsuite/gcc.dg/torture/pr107389.c > > > > > > > index deb63380704..23c2776ab73 100644 > > > > > > > --- a/gcc/testsuite/gcc.dg/pr107389.c > > > > > > > +++ b/gcc/testsuite/gcc.dg/torture/pr107389.c > > > > > > > @@ -1,5 +1,6 @@ > > > > > > > /* { dg-do compile } */ > > > > > > > -/* { dg-options "-fdump-tree-optimized-alias" } */ > > > > > > > +/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */ > > > > > > > +/* { dg-additional-options "-fdump-tree-optimized-alias" } */ > > > > > > > > > > > > > > unsigned foo (void *p) > > > > > > > { > > > > > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > > > > > > index 68e80baa46c..98c98dd8be4 100644 > > > > > > > --- a/gcc/tree-ssa-forwprop.cc > > > > > > > +++ b/gcc/tree-ssa-forwprop.cc > > > > > > > @@ -216,6 +216,30 @@ static bitmap to_purge; > > > > > > > /* Const-and-copy lattice. */ > > > > > > > static vec<tree> lattice; > > > > > > > > > > > > > > +/* forwprop_may_propagate_copy is like may_propagate_copy except > > > > > > > + after the final folding, don't allow propagating of pointer > > > > > > > ORIG > > > > > > > + that have a lower alignment than the DEST name. > > > > > > > + This is to prevent losing alignment information from __ > > > > > > > builtin_assume_aligned for expand (See also PR 122086). */ > > > > > > > +static bool > > > > > > > +forwprop_may_propagate_copy (tree dest, tree orig, > > > > > > > + bool dest_not_abnormal_phi_edge_p = > > > > > > > false) > > > > > > > +{ > > > > > > > + if (!may_propagate_copy (dest, orig, > > > > > > > dest_not_abnormal_phi_edge_p)) > > > > > > > + return false; > > > > > > > + > > > > > > > + /* Only check alignment for the final folding. */ > > > > > > > + if (!(cfun->curr_properties & PROP_last_full_fold)) > > > > > > > + return true; > > > > > > > + > > > > > > > + /* Alignment only matters for pointer types. */ > > > > > > > + if (!POINTER_TYPE_P (TREE_TYPE (dest)) || !POINTER_TYPE_P > > > > > > > (TREE_TYPE (orig))) > > > > > > > + return true; > > > > > > > + > > > > > > > + unsigned aligndest = get_pointer_alignment (dest); > > > > > > > + unsigned alignorig = get_pointer_alignment (orig); > > > > > > > + return aligndest <= alignorig; > > > > > > > +} > > > > > > > + > > > > > > > /* Set the lattice entry for NAME to VAL. */ > > > > > > > static void > > > > > > > fwprop_set_lattice_val (tree name, tree val) > > > > > > > @@ -5177,7 +5201,7 @@ pass_forwprop::execute (function *fun) > > > > > > > } > > > > > > > if (all_same) > > > > > > > { > > > > > > > - if (may_propagate_copy (res, first)) > > > > > > > + if (forwprop_may_propagate_copy (res, first)) > > > > > > > to_remove_defs.safe_push (SSA_NAME_VERSION (res)); > > > > > > > fwprop_set_lattice_val (res, first); > > > > > > > } > > > > > > > @@ -5532,7 +5556,7 @@ pass_forwprop::execute (function *fun) > > > > > > > { > > > > > > > if (!is_gimple_debug (stmt)) > > > > > > > bitmap_set_bit (simple_dce_worklist, > > > > > > > SSA_NAME_VERSION (use)); > > > > > > > - if (may_propagate_copy (use, val)) > > > > > > > + if (forwprop_may_propagate_copy (use, val)) > > > > > > > { > > > > > > > propagate_value (usep, val); > > > > > > > substituted_p = true; > > > > > > > @@ -5695,7 +5719,7 @@ pass_forwprop::execute (function *fun) > > > > > > > /* If we can propagate the lattice-value mark > > > > > > > the > > > > > > > stmt for removal. */ > > > > > > > if (val != lhs > > > > > > > - && may_propagate_copy (lhs, val)) > > > > > > > + && forwprop_may_propagate_copy (lhs, val)) > > > > > > > to_remove_defs.safe_push (SSA_NAME_VERSION > > > > > > > (lhs)); > > > > > > > fwprop_set_lattice_val (lhs, val); > > > > > > > } > > > > > > > @@ -5717,7 +5741,7 @@ pass_forwprop::execute (function *fun) > > > > > > > continue; > > > > > > > tree val = fwprop_ssa_val (arg); > > > > > > > if (val != arg > > > > > > > - && may_propagate_copy (arg, val, !(e->flags & > > > > > > > EDGE_ABNORMAL))) > > > > > > > + && forwprop_may_propagate_copy (arg, val, > > > > > > > !(e->flags & EDGE_ABNORMAL))) > > > > > > > propagate_value (use_p, val); > > > > > > > } > > > > > > > > > > > > > > -- > > > > > > > 2.43.0 > > > > > > >
