> On 29 Apr 2025, at 18:21, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
>> ptrue predicate can be replaced by neon instructions (LDR and STR),
>> thus avoiding the predicate altogether. This also enables formation of
>> LDP/STP pairs.
>> 
>> For example, the test cases
>> 
>> svfloat64_t
>> ptrue_load (float64_t *x)
>> {
>>  svbool_t pg = svptrue_b64 ();
>>  return svld1_f64 (pg, x);
>> }
>> void
>> ptrue_store (float64_t *x, svfloat64_t data)
>> {
>>  svbool_t pg = svptrue_b64 ();
>>  return svst1_f64 (pg, x, data);
>> }
>> 
>> were previously compiled to
>> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):
>> 
>> ptrue_load:
>>        ptrue   p3.b, vl16
>>        ld1d    z0.d, p3/z, [x0]
>>        ret
>> ptrue_store:
>>        ptrue   p3.b, vl16
>>        st1d    z0.d, p3, [x0]
>>        ret
>> 
>> Now the are compiled to:
>> 
>> ptrue_load:
>>        ldr     q0, [x0]
>>        ret
>> ptrue_store:
>>        str     q0, [x0]
>>        ret
>> 
>> The implementation includes the if-statement
>> if (known_eq (GET_MODE_SIZE (mode), 16)
>>    && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA)
>> which checks for 128-bit VLS and excludes partial modes with a
>> mode size < 128 (e.g. VNx2QI).
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>>      * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
>>      Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.
>> 
>> gcc/testsuite/
>>      * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
>>      * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
>>      * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
>>      * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
>>      * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
>> ---
>> gcc/config/aarch64/aarch64.cc                 | 29 ++++++++--
>> .../gcc.target/aarch64/sve/cond_arith_6.c     |  3 +-
>> .../aarch64/sve/ldst_ptrue_128_to_neon.c      | 48 ++++++++++++++++
>> .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 +++++--------
>> .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 +++++--------
>> .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 56 +++++++------------
>> 6 files changed, 118 insertions(+), 96 deletions(-)
>> create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
> 
> OK, thanks.
Thanks, pushed to trunk: 83bb288faa39a0bf5ce2d62e21a090a130d8dda4
Jennifer
> 
> Richard

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to