Re: [PATCH] aarch64: Improve svdupq_lane expension for big-endian [PR121293]

Richard Sandiford Wed, 30 Jul 2025 04:33:30 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Tuesday, July 29, 2025 1:43 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti 
>> <alice.carlo...@arm.com>;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
>> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>;
>> Wilco Dijkstra <wilco.dijks...@arm.com>
>> Subject: [PATCH] aarch64: Improve svdupq_lane expension for big-endian
>> [PR121293]
>> 
>> If the index to svdupq_lane is variable, or is outside the range of DUP,
>
> Minor nit, perhaps this should be explicitly DUPQ explicitly? DUP had me 
> confused
> for a bit.


I changed it to "the .Q form of DUP".  This is one of those cases
where a later development in the architecture conflicted with some
choices made for the original SVE ACLE:

- svdupq is a synthetic intrinsic that duplicates 128 bits' worth of
  elements to fill a full SVE vector.

- svdupq_lane is an SVE intrinsic that maps to the .Q form of DUP (indexed).

- svdup_laneq is an SVE2p1/SME2p1 intrinsic that maps to DUPQ.

In hindsight it would have been more future-proof to avoid adding characters
to the "mnemonic" part of the intrinsic name and instead use underscore
suffixes.

Thanks,
Richard

>
>> the fallback expansion is to convert to VNx2DI and use TBL.  The problem
>> in this PR was that the conversion used subregs, and on big-endian
>> targets, a bitcast from VNx2DI to another element size requires a
>> REV[BHW] in the best case or a spill and reload in the worst case.
>> (See the comment at the head of aarch64-sve.md for details.)
>> 
>> Here we want the conversion to act like svreinterpret, so it should
>> use aarch64_sve_reinterpret instead of subregs.
>> 
>> Tested on aarch64-linux-gnu.  OK to install?
>> 
>
> OK thanks,
> Tamar
>
>> Richard
>> 
>> 
>> gcc/
>>      PR target/121293
>>      * config/aarch64/aarch64-sve-builtins-base.cc (svdupq_lane::expand):
>>      Use aarch64_sve_reinterpret instead of subregs.  Explicitly
>>      reinterpret the result back to the required mode, rather than
>>      leaving the caller to take a subreg.
>> 
>> gcc/testsuite/
>>      PR target/121293
>>      * gcc.target/aarch64/sve/acle/general/dupq_lane_9.c: New test.
>> ---
>>  gcc/config/aarch64/aarch64-sve-builtins-base.cc           | 5 +++--
>>  .../gcc.target/aarch64/sve/acle/general/dupq_lane_9.c     | 8 ++++++++
>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>  create mode 100644
>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index b4396837c24..32cce97b220 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -1259,9 +1259,10 @@ public:
>>      index = target;
>>        }
>> 
>> -    e.args[0] = gen_lowpart (VNx2DImode, e.args[0]);
>> +    e.args[0] = aarch64_sve_reinterpret (VNx2DImode, e.args[0]);
>>      e.args[1] = index;
>> -    return e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di);
>> +    rtx res = e.use_exact_insn (CODE_FOR_aarch64_sve_tblvnx2di);
>> +    return aarch64_sve_reinterpret (mode, res);
>>    }
>>  };
>> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c
>> new file mode 100644
>> index 00000000000..e3f352bb8c2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_lane_9.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-options "-O2 -mbig-endian" } */
>> +
>> +#pragma GCC aarch64 "arm_sve.h"
>> +
>> +svint32_t f(svint32_t x) { return svdupq_lane (x, 17); }
>> +void g(svint32_t *a, svint32_t *b) { *a = svdupq_lane (*b, 17); }
>> +
>> +/* { dg-final { scan-assembler-not {\trevw\t} } } */
>> --
>> 2.43.0

Re: [PATCH] aarch64: Improve svdupq_lane expension for big-endian [PR121293]

Reply via email to