Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Tuesday, July 29, 2025 4:33 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan <alex.cop...@arm.com>; Alice Carlotti >> <alice.carlo...@arm.com>; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw >> <richard.earns...@arm.com>; Tamar Christina <tamar.christ...@arm.com>; >> Wilco Dijkstra <wilco.dijks...@arm.com> >> Subject: [PATCH] aarch64: Use VNx16BI for more SVE WHILE* results [PR121118] >> >> PR121118 is about a case where we try to construct a predicate >> constant using a permutation of a PFALSE and a WHILELO. The WHILELO >> is a .H operation and its result has mode VNx8BI. However, the >> permute instruction expects both inputs to be VNx16BI, leading to >> an unrecognisable insn ICE. >> >> VNx8BI is effectively a form of VNx16BI in which every odd-indexed >> bit is insignificant. In the PR's testcase that's OK, since those >> bits will be dropped by the permutation. But if the WHILELO had been a >> VNx4BI, so that only every fourth bit is significant, the input to the >> permutation would have had undefined bits. The testcase in the patch >> has an example of this. >> >> This feeds into a related ACLE problem that I'd been meaning to >> fix for a long time: every bit of an svbool_t result is significant, >> and so every ACLE intrinsic that returns an svbool_t should return a >> VNx16BI. That doesn't currently happen for ACLE svwhile* intrinsics. >> > > For my own understanding, *what* is the representation of svbool_t? > ACLE just seems to say it has "enough bits" [1] > > [1] https://arm-software.github.io/acle/main/acle.html#sve-predicate-types
In memory, it has the same layout as for LDR/STR. But as a type, it's intentionally opaque in the base ACLE, so that the only things that you can do with it are through intrinsics, copying, or zero initialisation. But in GCC terms it maps directly to VNx16BI. >> This patch fixes both issues together. >> >> We still need to keep the current WHILE* patterns for autovectorisation, >> where the result mode should match the element width. The patch >> therefore adds a new set of patterns that are defined to return >> VNx16BI instead. For want of a better scheme, it uses an "_acle" >> suffix to distinguish these new patterns from the "normal" ones. >> >> The formulation used is: >> >> (and:VNx16BI (subreg:VNx16BI normal-pattern 0) C) >> >> where C has mode VNx16BI and is a canonical ptrue for normal-pattern's >> element width (so that the low bit of each element is set and the upper >> bits are clear). >> >> This is a bit clunky, and leads to some repetition. But it has two >> advantages: >> >> * With an earlier simplify-rtx patch, converting the above expression >> back to normal-pattern's mode will reduce to normal-pattern, so that >> the pattern for testing the result using a PTEST doesn't change. >> >> * It gives RTL optimisers a bit more information, as the new tests >> demonstrate. >> >> In the expression above, C is matched using a new "special" predicate >> aarch64_ptrue_all_operand, where "special" means that the mode on the >> predicate is not necessarily the mode of the expression. In this case, >> C always has mode VNx16BI, but the mode on the predicate indicates which >> kind of canonical PTRUE is needed. >> >> Tested on aarch64-linux-gnu. OK to install? > > OK thanks. > > I was wondering if we couldn't do anything special for the cases where > both operands of the while* are zero. But it looks like PTRUES are as bad > as WHILE* and we have no PFALSES anyway ☹ There is a PFALSE, although it's not used much. But at the moment we do any constant folding on gimple rather than during expand, via: gimple * fold (gimple_folder &f) const override { if (f.vectors_per_tuple () > 1) return nullptr; /* Filter out cases where the condition is always true or always false. */ tree arg1 = gimple_call_arg (f.call, 1); if (!m_eq_p && operand_equal_p (arg1, TYPE_MIN_VALUE (TREE_TYPE (arg1)))) return f.fold_to_pfalse (); if (m_eq_p && operand_equal_p (arg1, TYPE_MAX_VALUE (TREE_TYPE (arg1)))) return f.fold_to_ptrue (); if (f.type_suffix (1).unsigned_p) return fold_type<poly_uint64> (f); else return fold_type<poly_int64> (f); } It only handles the original WHILELE/LO/LS/LT forms though. I suppose we should extend it to WHILEGE/GT/HI/HS at some point. Thanks, Richard