> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Tuesday, July 29, 2025 5:20 PM > To: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti > <alice.carlo...@arm.com>; > pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw > <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>; > Wilco Dijkstra <wilco.dijks...@arm.com>; gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <richard.sandif...@arm.com> > Subject: [PATCH 0/2] aarch64: Two fixes for PR121294 > > One long-standing problem with the implementation of the SVE ACLE > is that .H, .S, and .D predicate operations tend to have VNx8BI, > VNx4BI, and VNx2BI results. As with the fix for PR121118, this > representation is usually incorrect, since every bit of an svbool_t > result is significant: > > https://gcc.gnu.org/pipermail/gcc-patches/2025-July/691024.html > > In PR121294, this representation actively leads to wrong code. > .H, .S, and .D permutations operate on 2-bit, 4-bit, and 8-bit > predicate elements, but they copy all bits across verbatim. > That isn't something we need or rely on when permuting natural > VNx8BI, VNx4BI, or VNx2BI predicates, but it is something that > is guaranteed by the ACLE intrinsics. The current representation > instead allows RTL optimisers to substitute one type of ptrue > for another, as long as the low bit of each element doesn't change. > > Tested on aarch64-linux-gnu. OK for trunk and for backports? >
Had a minor comment on one of the testcases, but otherwise OK. thanks! Tamar > Richard > > Richard Sandiford (2): > aarch64: Use VNx16BI for more permutations [PR121294] > aarch64: Use VNx16BI for svrev_b* [PR121294] > > .../aarch64/aarch64-sve-builtins-base.cc | 5 +- > .../aarch64/aarch64-sve-builtins-functions.h | 5 +- > gcc/config/aarch64/aarch64-sve.md | 62 ++++++++++-- > gcc/config/aarch64/aarch64.cc | 3 +- > gcc/config/aarch64/aarch64.md | 1 + > gcc/config/aarch64/iterators.md | 4 +- > .../aarch64/sve/acle/general/perm_2.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/perm_3.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/perm_4.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/perm_5.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/perm_6.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/perm_7.c | 96 +++++++++++++++++++ > .../aarch64/sve/acle/general/rev_2.c | 24 +++++ > 13 files changed, 666 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_3.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_4.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_5.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_6.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_7.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev_2.c > > -- > 2.43.0