> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Tuesday, July 29, 2025 1:43 PM > To: gcc-patches@gcc.gnu.org > Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti > <alice.carlo...@arm.com>; > pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw > <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>; > Wilco Dijkstra <wilco.dijks...@arm.com> > Subject: [PATCH] aarch64: Improve svdupq_lane expension for big-endian > [PR121293] > > If the index to svdupq_lane is variable, or is outside the range of DUP,
Minor nit, perhaps this should be explicitly DUPQ explicitly? DUP had me confused for a bit. > the fallback expansion is to convert to VNx2DI and use TBL. The problem > in this PR was that the conversion used subregs, and on big-endian > targets, a bitcast from VNx2DI to another element size requires a > REV[BHW] in the best case or a spill and reload in the worst case. > (See the comment at the head of aarch64-sve.md for details.) > > Here we want the conversion to act like svreinterpret, so it should > use aarch64_sve_reinterpret instead of subregs. > > Tested on aarch64-linux-gnu. OK to install? > OK thanks, Tamar > Richard > > > gcc/ > PR target/121293 > * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_lane::expand): > Use aarch64_sve_reinterpret instead of subregs. Explicitly > reinterpret the result back to the required mode, rather than > leaving the caller to take a subreg. > > gcc/testsuite/ > PR target/121293 > * gcc.target/aarch64/sve/acle/general/dupq_lane_9.c: New test. > --- > gcc/config/aarch64/aarch64-sve-builtins-base.cc | 5 +++-- > .../gcc.target/aarch64/sve/acle/general/dupq_lane_9.c | 8 ++++++++ > 2 files changed, 11 insertions(+), 2 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index b4396837c24..32cce97b220 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -1259,9 +1259,10 @@ public: > index = target; > } > > - e.args[0] = gen_lowpart (VNx2DImode, e.args[0]); > + e.args[0] = aarch64_sve_reinterpret (VNx2DImode, e.args[0]); > e.args[1] = index; > - return e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di); > + rtx res = e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di); > + return aarch64_sve_reinterpret (mode, res); > } > }; > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c > new file mode 100644 > index 00000000000..e3f352bb8c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c > @@ -0,0 +1,8 @@ > +/* { dg-options "-O2 -mbig-endian" } */ > + > +#pragma GCC aarch64 "arm_sve.h" > + > +svint32_t f(svint32_t x) { return svdupq_lane (x, 17); } > +void g(svint32_t *a, svint32_t *b) { *a = svdupq_lane (*b, 17); } > + > +/* { dg-final { scan-assembler-not {\trevw\t} } } */ > -- > 2.43.0