Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Tuesday, July 29, 2025 1:43 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti >> <alice.carlo...@arm.com>; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw >> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>; >> Wilco Dijkstra <wilco.dijks...@arm.com> >> Subject: [PATCH] aarch64: Improve svdupq_lane expension for big-endian >> [PR121293] >> >> If the index to svdupq_lane is variable, or is outside the range of DUP, > > Minor nit, perhaps this should be explicitly DUPQ explicitly? DUP had me > confused > for a bit.
I changed it to "the .Q form of DUP". This is one of those cases where a later development in the architecture conflicted with some choices made for the original SVE ACLE: - svdupq is a synthetic intrinsic that duplicates 128 bits' worth of elements to fill a full SVE vector. - svdupq_lane is an SVE intrinsic that maps to the .Q form of DUP (indexed). - svdup_laneq is an SVE2p1/SME2p1 intrinsic that maps to DUPQ. In hindsight it would have been more future-proof to avoid adding characters to the "mnemonic" part of the intrinsic name and instead use underscore suffixes. Thanks, Richard > >> the fallback expansion is to convert to VNx2DI and use TBL. The problem >> in this PR was that the conversion used subregs, and on big-endian >> targets, a bitcast from VNx2DI to another element size requires a >> REV[BHW] in the best case or a spill and reload in the worst case. >> (See the comment at the head of aarch64-sve.md for details.) >> >> Here we want the conversion to act like svreinterpret, so it should >> use aarch64_sve_reinterpret instead of subregs. >> >> Tested on aarch64-linux-gnu. OK to install? >> > > OK thanks, > Tamar > >> Richard >> >> >> gcc/ >> PR target/121293 >> * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_lane::expand): >> Use aarch64_sve_reinterpret instead of subregs. Explicitly >> reinterpret the result back to the required mode, rather than >> leaving the caller to take a subreg. >> >> gcc/testsuite/ >> PR target/121293 >> * gcc.target/aarch64/sve/acle/general/dupq_lane_9.c: New test. >> --- >> gcc/config/aarch64/aarch64-sve-builtins-base.cc | 5 +++-- >> .../gcc.target/aarch64/sve/acle/general/dupq_lane_9.c | 8 ++++++++ >> 2 files changed, 11 insertions(+), 2 deletions(-) >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> index b4396837c24..32cce97b220 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >> @@ -1259,9 +1259,10 @@ public: >> index = target; >> } >> >> - e.args[0] = gen_lowpart (VNx2DImode, e.args[0]); >> + e.args[0] = aarch64_sve_reinterpret (VNx2DImode, e.args[0]); >> e.args[1] = index; >> - return e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di); >> + rtx res = e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di); >> + return aarch64_sve_reinterpret (mode, res); >> } >> }; >> >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c >> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c >> new file mode 100644 >> index 00000000000..e3f352bb8c2 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c >> @@ -0,0 +1,8 @@ >> +/* { dg-options "-O2 -mbig-endian" } */ >> + >> +#pragma GCC aarch64 "arm_sve.h" >> + >> +svint32_t f(svint32_t x) { return svdupq_lane (x, 17); } >> +void g(svint32_t *a, svint32_t *b) { *a = svdupq_lane (*b, 17); } >> + >> +/* { dg-final { scan-assembler-not {\trevw\t} } } */ >> -- >> 2.43.0