> -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: Thursday, February 15, 2024 8:27 AM > To: Richard Sandiford <richard.sandif...@arm.com>; Andrew Pinski (QUIC) > <quic_apin...@quicinc.com> > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding > whole vector shift right [PR113872] > > > -----Original Message----- > > From: Richard Sandiford <richard.sandif...@arm.com> > > Sent: Thursday, February 15, 2024 2:56 PM > > To: Andrew Pinski <quic_apin...@quicinc.com> > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina <tamar.christ...@arm.com> > > Subject: Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by > > adding whole vector shift right [PR113872] > > > > Andrew Pinski <quic_apin...@quicinc.com> writes: > > > The backend currently defines a whole vector shift left for 64bit > > > vectors, adding > > the > > > shift right can also improve code for some PERMs too. So this adds that > pattern. > > > > Is this reversed? It looks like we have the shift right and the patch > > is adding the shift left (at least in GCC internal and little-endian terms). > > > > But on many Arm cores, EXT has a higher throughput than SHL, so I > > don't think we should do this unconditionally. > > Yeah, on most (if not all) all Arm cores the EXT has higher throughput than > SHL > and on Cortex-A75 the EXT has both higher throughput and lower latency. > > I guess the expected gain here is that we wouldn't need to create the zero > vector, However on modern Arm cores the zero vector creation is free using > movi and EXT being three operand also means we only need one copy if e.g in > a loop.
That might be true on Arm's cores but that is not true on Quacom's cores which I am working with. EXT and SHL have the same throughput and latency but movi is not free (in that it still not done in the renamer) and a register is definitely not free if there is huge register pressure. So I think we will need to figure out a way to have this as a tuning mechanism. Thanks, Andrew Pinski > > Kind Regards, > Tamar > > > > > Thanks, > > Richard > > > > > > > > I added a testcase for the shift left also. I also fixed the > > > instruction template there which was using a space instead of a tab after > the instruction. > > > > > > Built and tested on aarch64-linux-gnu. > > > > > > PR target/113872 > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64-simd.md > (vec_shr_<mode><vczle><vczbe>): > > Use tab instead of space after > > > the instruction in the template. > > > (vec_shl_<mode><vczle><vczbe>): New pattern > > > * config/aarch64/iterators.md (unspec): Add UNSPEC_VEC_SHL > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/perm_zero-1.c: New test. > > > * gcc.target/aarch64/perm_zero-2.c: New test. > > > > > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> > > > --- > > > gcc/config/aarch64/aarch64-simd.md | 18 ++++++++++++++++-- > > > gcc/config/aarch64/iterators.md | 1 + > > > gcc/testsuite/gcc.target/aarch64/perm_zero-1.c | 15 +++++++++++++++ > > > gcc/testsuite/gcc.target/aarch64/perm_zero-2.c | 15 +++++++++++++++ > > > 4 files changed, 47 insertions(+), 2 deletions(-) create mode > > > 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-1.c > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/perm_zero-2.c > > > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > > index f8bb973a278..0d2f1ea3902 100644 > > > --- a/gcc/config/aarch64/aarch64-simd.md > > > +++ b/gcc/config/aarch64/aarch64-simd.md > > > @@ -1592,9 +1592,23 @@ (define_insn > "vec_shr_<mode><vczle><vczbe>" > > > "TARGET_SIMD" > > > { > > > if (BYTES_BIG_ENDIAN) > > > - return "shl %d0, %d1, %2"; > > > + return "shl\t%d0, %d1, %2"; > > > else > > > - return "ushr %d0, %d1, %2"; > > > + return "ushr\t%d0, %d1, %2"; > > > + } > > > + [(set_attr "type" "neon_shift_imm")] > > > +) > > > +(define_insn "vec_shl_<mode><vczle><vczbe>" > > > + [(set (match_operand:VD 0 "register_operand" "=w") > > > + (unspec:VD [(match_operand:VD 1 "register_operand" "w") > > > + (match_operand:SI 2 "immediate_operand" "i")] > > > + UNSPEC_VEC_SHL))] > > > + "TARGET_SIMD" > > > + { > > > + if (BYTES_BIG_ENDIAN) > > > + return "ushr\t%d0, %d1, %2"; > > > + else > > > + return "shl\t%d0, %d1, %2"; > > > } > > > [(set_attr "type" "neon_shift_imm")] > > > ) > > > diff --git a/gcc/config/aarch64/iterators.md > > > b/gcc/config/aarch64/iterators.md index 99cde46f1ba..3aebe9cf18a > > > 100644 > > > --- a/gcc/config/aarch64/iterators.md > > > +++ b/gcc/config/aarch64/iterators.md > > > @@ -758,6 +758,7 @@ (define_c_enum "unspec" > > > UNSPEC_PMULL ; Used in aarch64-simd.md. > > > UNSPEC_PMULL2 ; Used in aarch64-simd.md. > > > UNSPEC_REV_REGLIST ; Used in aarch64-simd.md. > > > + UNSPEC_VEC_SHL ; Used in aarch64-simd.md. > > > UNSPEC_VEC_SHR ; Used in aarch64-simd.md. > > > UNSPEC_SQRDMLAH ; Used in aarch64-simd.md. > > > UNSPEC_SQRDMLSH ; Used in aarch64-simd.md. > > > diff --git a/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c > > b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c > > > new file mode 100644 > > > index 00000000000..3c8f0591a2f > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/perm_zero-1.c > > > @@ -0,0 +1,15 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2" } */ > > > +/* PR target/113872 */ > > > +/* For 64bit vectors, PERM with a constant 0 should produce a shift > > > +instead of > > the ext instruction. */ > > > + > > > +#define vect64 __attribute__((vector_size(8))) > > > + > > > +void f(vect64 unsigned short *a) > > > +{ > > > + *a = __builtin_shufflevector((vect64 unsigned short){0},*a, > > > +3,4,5,6); } > > > + > > > +/* { dg-final { scan-assembler-times "ushr\t" 1 { target > > > +aarch64_big_endian } } } > > */ > > > +/* { dg-final { scan-assembler-times "shl\t" 1 { target > > > +aarch64_little_endian } } } > > */ > > > +/* { dg-final { scan-assembler-not "ext\t" } } */ > > > diff --git a/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c > > b/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c > > > new file mode 100644 > > > index 00000000000..970e428f832 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/perm_zero-2.c > > > @@ -0,0 +1,15 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2" } */ > > > +/* PR target/113872 */ > > > +/* For 64bit vectors, PERM with a constant 0 should produce a shift > > > +instead of > > the ext instruction. */ > > > + > > > +#define vect64 __attribute__((vector_size(8))) > > > + > > > +void f(vect64 unsigned short *a) > > > +{ > > > + *a = __builtin_shufflevector(*a, (vect64 unsigned > > > +short){0},3,4,5,6); } > > > + > > > +/* { dg-final { scan-assembler-times "shl\t" 1 { target > > > +aarch64_big_endian } } } */ > > > +/* { dg-final { scan-assembler-times "ushr\t" 1 { target > > > +aarch64_little_endian } } > > } */ > > > +/* { dg-final { scan-assembler-not "ext\t" } } */