Richard Sandiford <[email protected]> writes: > Artemiy Volkov <[email protected]> writes: >> SVE2.2 (or in streaming mode, SME2.2) adds support for zeroing >> predication for the following SVE bit reversal instructions: >> >> - REVB, REVH, REVW (Reverse bytes / halfwords / words within elements >> (predicated)) >> - REVD (Reverse 64-bit doublewords in elements (predicated)) (SVE2 only) >> >> The first three are covered by the SVE_INT_UNARY code iterator, and REVD, >> being SVE2-only, has a standalone pattern in aarch64-sve2.md. This patch >> adds an alternative for the zeroing-predication forms of the original >> instructions. The pattern for REVD also required changes to the predicate >> for operand 3 to accept constant zero RTX. >> >> The tests that have been added are based on the original SVE/SVE2 tests >> for corresponding instructions, but all have a "_z" suffix in their name >> since they only test codegen for the "_z" variants of the corresponding >> intrinsics. >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve.md (@cond_<optab><mode>): >> New alternative for zeroing predication. Add `arch` attribute >> to every alternative. >> * config/aarch64/aarch64-sve2.md (@cond_<optab><mode>): >> Accept constant zero as operand 3. New alternative for zeroing >> predication. Add `arch` attribute to every alternative. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve2/acle/asm/revb_s16_z.c: New test. >> * gcc.target/aarch64/sve2/acle/asm/revb_s32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revb_s64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revb_u16_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revb_u32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revb_u64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_bf16_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_f16_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_f32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_f64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_s16_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_s32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_s64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_s8_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_u16_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_u32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_u64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revd_u8_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revh_s32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revh_s64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revh_u32_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revh_u64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revw_s64_z.c: Likewise. >> * gcc.target/aarch64/sve2/acle/asm/revw_u64_z.c: Likewise. >> --- >> gcc/config/aarch64/aarch64-sve.md | 9 ++-- >> gcc/config/aarch64/aarch64-sve2.md | 9 ++-- >> .../aarch64/sve2/acle/asm/revb_s16_z.c | 30 ++++++++++++ >> .../aarch64/sve2/acle/asm/revb_s32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revb_s64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revb_u16_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revb_u32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revb_u64_z.c | 29 ++++++++++++ >> .../aarch64/sve2/acle/asm/revd_bf16_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_f16_z.c | 47 +++++++++++++++++++ >> .../aarch64/sve2/acle/asm/revd_f32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_f64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_s16_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_s32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_s64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_s8_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_u16_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_u32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_u64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revd_u8_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revh_s32_z.c | 30 ++++++++++++ >> .../aarch64/sve2/acle/asm/revh_s64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revh_u32_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revh_u64_z.c | 30 ++++++++++++ >> .../aarch64/sve2/acle/asm/revw_s64_z.c | 28 +++++++++++ >> .../aarch64/sve2/acle/asm/revw_u64_z.c | 29 ++++++++++++ >> 26 files changed, 709 insertions(+), 8 deletions(-) >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_s16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_s32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_s64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_u16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_u32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revb_u64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_bf16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_f16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_f32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_f64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_s16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_s32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_s64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_s8_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_u16_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_u32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_u64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_u8_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revh_s32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revh_s64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revh_u32_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revh_u64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revw_s64_z.c >> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revw_u64_z.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve.md >> b/gcc/config/aarch64/aarch64-sve.md >> index 4b4e6cec549..79f0136101a 100644 >> --- a/gcc/config/aarch64/aarch64-sve.md >> +++ b/gcc/config/aarch64/aarch64-sve.md >> @@ -3531,10 +3531,11 @@ >> (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")] >> UNSPEC_SEL))] >> "TARGET_SVE && <elem_bits> >= <min_elem_bits>" >> - {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] >> - [ w , Upl , w , 0 ; * ] >> <sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype> >> - [ ?&w , Upl , w , Dz ; yes ] movprfx\t%0.<Vetype>, >> %1/z, %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype> >> - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, >> %3\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype> >> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx, arch ] >> + [ w , Upl , w , 0 ; * , * ] >> <sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype> >> + [ w , Upl , w , Dz ; * , sve2p2_or_sme2p2 ] >> <sve_int_op>\t%0.<Vetype>, %1/z, %2.<Vetype> >> + [ ?&w , Upl , w , Dz ; yes , * ] >> movprfx\t%0.<Vetype>, %1/z, %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, >> %2.<Vetype> >> + [ ?&w , Upl , w , w ; yes , * ] movprfx\t%0, >> %3\;<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype> >> } >> [(set_attr "sve_type" "sve_int_general")] >> ) >> diff --git a/gcc/config/aarch64/aarch64-sve2.md >> b/gcc/config/aarch64/aarch64-sve2.md >> index 017efb3cf5e..45e8b0a02a4 100644 >> --- a/gcc/config/aarch64/aarch64-sve2.md >> +++ b/gcc/config/aarch64/aarch64-sve2.md >> @@ -4144,12 +4144,13 @@ >> (unspec:SVE_FULL >> [(match_operand:SVE_FULL 2 "register_operand")] >> UNSPEC_REVD_ONLY) >> - (match_operand:SVE_FULL 3 "register_operand")] >> + (match_operand:SVE_FULL 3 "aarch64_simd_reg_or_zero")] >> UNSPEC_SEL))] >> "TARGET_SVE2p1_OR_SME" >> - {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] >> - [ w , Upl , w , 0 ; * ] revd\t%0.q, %1/m, %2.q >> - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, >> %3\;revd\t%0.q, %1/m, %2.q >> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx, arch ] >> + [ w , Upl , w , 0 ; * , * ] revd\t%0.q, %1/m, >> %2.q >> + [ w , Upl , w , Dz ; * , sve2p2_or_sme2p2 ] revd\t%0.q, %1/z, >> %2.q >> + [ ?&w , Upl , w , w ; yes , * ] movprfx\t%0, >> %3\;revd\t%0.q, %1/m, %2.q > > This looks behaviourally correct, but the predicate change will have the > effect of allowing zero for SVE2p1 before register allocation, leaving the > allocator to find a spare register at the last minute. It'd probably be > better to reject zero in the C++ condition for !TARGET_SVE2p2_OR_SME2p2, > or add a new predicate that only allows zero for TARGET_SVE2p2_OR_SME2p2.
Sorry, saw later that Kyrill made the same comment in response to patch 8.
