On Wed, Nov 19, 2025 at 12:09 AM Kugan Vivekanandarajah
<[email protected]> wrote:
>
>
>
> > On 18 Nov 2025, at 11:48 pm, Richard Biener <[email protected]> 
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Nov 17, 2025 at 1:03 AM Kugan Vivekanandarajah
> > <[email protected]> wrote:
> >>
> >>
> >>
> >>> On 23 Oct 2025, at 8:37 pm, Richard Biener <[email protected]> 
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Wed, Oct 22, 2025 at 12:12 PM Kugan Vivekanandarajah
> >>> <[email protected]> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On 22 Oct 2025, at 7:51 pm, Richard Biener <[email protected]> 
> >>>>> wrote:
> >>>>>
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 21, 2025 at 11:55 PM Kugan Vivekanandarajah
> >>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>> Hi Richard,
> >>>>>> Thanks for the review.
> >>>>>>
> >>>>>>> On 16 Oct 2025, at 11:06 pm, Richard Biener 
> >>>>>>> <[email protected]> wrote:
> >>>>>>>
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Oct 15, 2025 at 11:08 PM Kugan Vivekanandarajah
> >>>>>>> <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> This patch enables Loop Invariant Code Motion (LICM) to hoist loads 
> >>>>>>>> that
> >>>>>>>> alias with stores rom the loaded value.
> >>>>>>>>
> >>>>>>>> The pattern a[i] = a[0] is common in TSVC benchmarks (s293):
> >>>>>>>>
> >>>>>>>> for (int i = 0; i < N; i++)
> >>>>>>>> a[i] = a[0];
> >>>>>>>>
> >>>>>>>> Previously, GCC conservatively rejected hoisting a[0] due to 
> >>>>>>>> potential
> >>>>>>>> aliasing when i==0. However, this is a "self write" - even when 
> >>>>>>>> aliasing
> >>>>>>>> occurs, we're writing back the same value, making hoisting safe.
> >>>>>>>>
> >>>>>>>> The optimization checks that:
> >>>>>>>> 1. One reference is a load, the other is a store
> >>>>>>>> 2. The stored SSA value equals the loaded SSA value
> >>>>>>>> 3. Only simple cases with single accesses per reference
> >>>>>>>>
> >>>>>>>> This enables vectorization of these patterns by allowing the 
> >>>>>>>> vectorizer
> >>>>>>>> to see the hoisted loop-invariant value.
> >>>>>>>>
> >>>>>>>> With the patch, the loop now vectorizes and generates:
> >>>>>>>>
> >>>>>>>>     .L2:
> >>>>>>>> -       ldr     s31, [x1]
> >>>>>>>> -       str     s31, [x0], 4
> >>>>>>>> -       cmp     x0, x2
> >>>>>>>> +       str     q31, [x0], 16
> >>>>>>>> +       cmp     x0, x1
> >>>>>>>>     bne     .L2
> >>>>>>>>
> >>>>>>>> gcc/ChangeLog:
> >>>>>>>
> >>>>>>> So refs_independent_p has no particular oder on the arguments it
> >>>>>>> receives and both could be stores.  It seems to me the special-casing
> >>>>>>> would be better done in ref_indep_loop_p for the kind == lim_raw
> >>>>>>> only.  Also
> >>>>>>>
> >>>>>>> +      if (((ref1->loaded && !ref1->stored && ref2->stored)
> >>>>>>> +          || (ref2->loaded && !ref2->stored && ref1->stored))
> >>>>>>> +         && is_self_write (ref1, ref2))
> >>>>>>> +       {
> >>>>>>>
> >>>>>>> you seem to imply both orders are handled, but is_self_write only
> >>>>>>> handles ref1 being the load.  Handing this upthread makes the
> >>>>>>> order obvious.
> >>>>>>>
> >>>>>>> So can you see to move this up?  Otherwise I think this should be OK.
> >>>>>>> I'll note this does not only handle the "obvious" case but also
> >>>>>>
> >>>>>> In the attached patched, I have moved based on the suggestion.
> >>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New 
> >>>>>> regressions.
> >>>>>> Is this OK?
> >>>>>
> >>>>> It seems you attached the old patch?
> >>>>
> >>>> Apologies. I sent the wrong patch. Here is the correct one.
> >>>
> >>> This seems to be an incremental patch of some sorts.  Applying some 
> >>> "guessing"
> >>> the overall thing looks OK, I'd probably try micro-optimizing this
> >>> timing critical
> >>> part a bit by pre-computing load_stmt outside of the refs_to_check 
> >>> iteration
> >>> and only call is_self_write if that was successful (lim_raw && single
> >>> access in loop).
> >>>
> >>> Please send an updated and complete patch.  You might want to 
> >>> double-check it
> >>> before posting ;)
> >>>
> >> Apologies for the mixup again. Here is the patch with added check for 
> >> ref1->loaded && ref2->stored
> >
> > This is the original patch again ....
> >
> > So NACK.
> >
> > :/
> >
> > Please triple-check what you attach.
>
> Oops, Sorry. No excuse for this stupidity from my end. Thanks again for your 
> patience.
> I have revised it. Hopefully I didn’t make any mistake this time.
>
> Bootstrapped and regression tested on aarch64-linux-gnu with no New

OK.

Thanks,
Richard.

> Thanks,
> Kugan
>
>
> >
> > Richard.
> >
> >> Bootstrapped and regression tests on aarch64-linuc-gnu with no new 
> >> regressions.
> >>
> >> Thanks,
> >> Kugan
> >>
> >>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>> Thanks,
> >>>> Kugan
> >>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>> Kugan
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> void foo(int *p, int *q, int j)
> >>>>>>> {
> >>>>>>> for  (int i =0; i < N; ++i)
> >>>>>>> p[i] = q[j];
> >>>>>>> }
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>>     * tree-ssa-loop-im.cc (is_self_write): New.
> >>>>>>>>     (refs_independent_p): Allow hoisting when aliasing references
> >>>>>>>>     form a self write pattern.
> >>>>>>>>
> >>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>
> >>>>>>>>     * gcc.dg/vect/vect-licm-hoist-1.c: New.
> >>>>>>>>     * gcc.dg/vect/vect-licm-hoist-2.c: Likewise.
> >>>>>>>>
> >>>>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New
> >>>>>>>> regressions.
> >>>>>>>>
> >>>>>>>> Is this Ok?
> >>>>>>>>
> >>>>>>>> Signed-off-by: Kugan Vivekanandarajah <[email protected]>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Kugan
>
>

Reply via email to