> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Tuesday, July 29, 2025 5:20 PM
> To: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti 
> <alice.carlo...@arm.com>;
> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>;
> Wilco Dijkstra <wilco.dijks...@arm.com>; gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <richard.sandif...@arm.com>
> Subject: [PATCH 0/2] aarch64: Two fixes for PR121294
> 
> One long-standing problem with the implementation of the SVE ACLE
> is that .H, .S, and .D predicate operations tend to have VNx8BI,
> VNx4BI, and VNx2BI results.  As with the fix for PR121118, this
> representation is usually incorrect, since every bit of an svbool_t
> result is significant:
> 
>     https://gcc.gnu.org/pipermail/gcc-patches/2025-July/691024.html
> 
> In PR121294, this representation actively leads to wrong code.
> .H, .S, and .D permutations operate on 2-bit, 4-bit, and 8-bit
> predicate elements, but they copy all bits across verbatim.
> That isn't something we need or rely on when permuting natural
> VNx8BI, VNx4BI, or VNx2BI predicates, but it is something that
> is guaranteed by the ACLE intrinsics.  The current representation
> instead allows RTL optimisers to substitute one type of ptrue
> for another, as long as the low bit of each element doesn't change.
> 
> Tested on aarch64-linux-gnu.  OK for trunk and for backports?
> 

Had a minor comment on one of the testcases, but otherwise OK.

thanks!
Tamar

> Richard
> 
> Richard Sandiford (2):
>   aarch64: Use VNx16BI for more permutations [PR121294]
>   aarch64: Use VNx16BI for svrev_b* [PR121294]
> 
>  .../aarch64/aarch64-sve-builtins-base.cc      |  5 +-
>  .../aarch64/aarch64-sve-builtins-functions.h  |  5 +-
>  gcc/config/aarch64/aarch64-sve.md             | 62 ++++++++++--
>  gcc/config/aarch64/aarch64.cc                 |  3 +-
>  gcc/config/aarch64/aarch64.md                 |  1 +
>  gcc/config/aarch64/iterators.md               |  4 +-
>  .../aarch64/sve/acle/general/perm_2.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/perm_3.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/perm_4.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/perm_5.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/perm_6.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/perm_7.c         | 96 +++++++++++++++++++
>  .../aarch64/sve/acle/general/rev_2.c          | 24 +++++
>  13 files changed, 666 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_4.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_5.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_6.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/perm_7.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev_2.c
> 
> --
> 2.43.0

Reply via email to