> On 22 Oct 2025, at 8:32 pm, Tamar Christina <[email protected]> wrote: > > External email: Use caution opening links or attachments > > >> -----Original Message----- >> From: Richard Biener <[email protected]> >> Sent: 22 October 2025 09:53 >> To: Kugan Vivekanandarajah <[email protected]> >> Cc: [email protected]; Tamar Christina <[email protected]> >> Subject: Re: [PATCH][tree-optimization/61338] - Optimize redundant reverse >> permutations in vectorized stores >> >> On Tue, Oct 21, 2025 at 11:57 PM Kugan Vivekanandarajah >> <[email protected]> wrote: >>> >>> Hi Richard, >>> >>> Thanks for the review. >>> >>>> On 15 Oct 2025, at 10:39 pm, Richard Biener >> <[email protected]> wrote: >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On Wed, Oct 15, 2025 at 12:08 AM Kugan Vivekanandarajah >>>> <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> This patch eliminates redundant reverse permutations in vectorized >> reverse >>>>> loops by detecting and optimizing patterns during store vectorization. >>>>> >>>>> The reverse load (b[i]) generates PERM, operations are applied, then the >>>>> reverse store adds another PERM. This creates redundant permute pairs >> that >>>>> we now detect and eliminate. >>>>> >>>>> With the patch, for the example loop >>>>> for (int i = N - 1; i >= 0; i--) >>>>> { >>>>> a[i] = b[i] + 1.0f; >>>>> } >>>>> Changes to the following >>>>> - ldr q29, [x0, x2] >>>>> - tbl v29.16b, {v29.16b}, v31.16b >>>>> - fadd v29.4s, v29.4s, v30.4s >>>>> - tbl v29.16b, {v29.16b}, v31.16b >>>>> - str q29, [x3, x2] >>>>> + ldr q30, [x0, x2] >>>>> + fadd v30.4s, v30.4s, v31.4s >>>>> + str q30, [x3, x2] >>>> >>>> So this works basically as a post-processing optimization at the time >>>> we generate the >>>> vector store. While that's in principle an OK optimization I'd rather >>>> have such post-processing >>>> implemented outside of the vectorizer because then also permutes not >>>> originating from >>>> vectorizer permuted store generation would benefit. >>>> >>>> As for implementing this in the vectorizer itself the more appropriate >>>> thing would be >>>> to expose these permutes to the permute optimization phase, because >> then it can >>>> be also taken into account during costing and a reverse load permute >>>> could be elided >>>> if it feeds an associatable reduction. >>>> >>>> There is, unfortunately, currently no good way to represent how we >> implement >>>> negative strided contiguous accesses with load permutations as the >> peculiarity >>>> only exposes itself after applying the VF and load/lane permutations are >>>> represented on the VF == 1 SLP graph. One of my ideas what that once we >>>> settle on VF (and possibly vector types) we want to expand the SLP graph >>>> to cover all lanes of the vector loop so we can expose actual permutes and >>>> vector granularity. This is a bit far off though. >>>> >>>> So in line with your patch but more appropriate for in-vectorizer >>>> operation would >>>> be an analysis on the SLP graph that simply marks reverse permutes that >> can >>>> be elided (for the back-to-back case). This way both costing and code >>>> generation >>>> can take this into account and you wouldn't have to adjust any stmts. >>> >>> I have now changed it to account for the costing. Bootstrapped and >> regression tested on aarch64-linux-gnu. >>> >>> Is this OK? >> >> Same here? >> > > Did you send the right version of the patch Kugan? It's identical to the one > you > sent before and also has some changes in gcc/fortran/resolve.cc not specified > and your changelog seems to have an incorrect format, the files containing > what > you changed aren't mentioned.
Apologies again. Here is the correct version. This also does not have the changes for resolve.cc <http://resolve.cc/>. Thanks, Kugan > > Thanks, > Tamar > >>> Thanks, >>> Kugan >>> >>> >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> PR tree-optimization/61338 >>>>> >>>>> gcc/ChangeLog: >>>>> (get_vector_perm_operand): New. >>>>> (vect_find_reverse_permute_operand): New helper function >>>>> to find reverse permutations through element-wise operation chains. >>>>> Returns true only if ALL operands have reverse permutations. >>>>> (vectorizable_store): Use recursive helper to eliminate redundant >>>>> reverse permutations with configurable search depth. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.dg/vect/slp-permute-reverse-1.c: New test for basic >>>>> reverse permute optimization (simple copy). >>>>> * gcc.dg/vect/slp-permute-reverse-2.c: New runtime test for >>>>> basic pattern. >>>>> Signed-off-by: Kugan Vivekanandarajah <[email protected]> >>>>> >>>>> Bootstrapped and regression tested on aarch64-linux-gcc. Is this OK? >>>>> >>>>> Thanks, >>>>> Kugan >>> >>>
0001-tree-optimization-61338-v2-Optimize-redundant-revers.patch
Description: 0001-tree-optimization-61338-v2-Optimize-redundant-revers.patch
