This series incrementally adds support for operations on unpacked vectors of floating-point values. By "unpacked", we're referring to the in-register layout of partial SVE vector modes. For example, the elements of a VNx4HF are stored as:
... | X | HF | X | HF | X | HF | X | HF | Where 'X' denotes the undefined upper half of the 32-bit container that each 16-bit value is stored in. This padding must not affect the operation's behavior, so should not be interpreted if the operation may trap. The series is organised as follows: * NFCs to iterators.md that lay the groundwork for the rest of the series. * Unpacked conversions, in which a solution to the issue described above is given. * Unpacked comparisons, which are slightly less trivial than... * Unpacked unary/binary/ternary operations, each of which is broken down into: * Defining the unconditional expansion * Supporting OP/UNSPEC_SEL combiner patterns under SVE_RELAXED_GP * Defining the conditional expander (if applicable) This allows each change to aarch64-sve.md to be testable; once the conditional expander for an operation is defined, the rules in match.pd canonicalize any occurrence of that operation combined with a VEC_COND_EXPR into these conditional forms, which would make the SVE_RELAXED_GP patterns dead at trunk. I’ve taken this approach because I believe it’s valuable to have these patterns to fall back on. Notes on code generation under -ftrapping-math: 1) In the example below, we're currently unable to remove (1) in favour of (2). ptrue p6.b, all (1) ptrue p7.d, all (2) ld1w z30.d, p6/z, [x1] ld1w z29.d, p6/z, [x3] fsub z30.s, p7/m, z30.s, #1.0 In the expanded RTL, the predicate source of the LD1Ws is a (subreg:VNx2BI (reg:VNx16BI 111) 0), where every bit of 111 is a 1. The predicate source of the FSUB is a (subreg:VNx4BI (reg:VNx16BI 112) 0), where every 8th bit of 112 is a 1, and the rest are 0. 2) The AND emitted by the conditional expander typically follows a CMP<CC> operation, where it is trivially redundant. cmpne p5.d, p7/z, z0.d, #0 ptrue p6.d, vl32 and p6.b, p6/z, p5.b, p5.b The fold we need here is slightly different from what the existing *cmp<cmp_op><mode>_and splitting patterns achieve, in that we don’t need to replace p7 with p6 to make the AND redundant. The AND in this case has the structure: (set (reg:VNx4BI 113) (and (subreg:VNx4BI (reg:VNx16BI 111) 0) (subreg:VNx4BI (reg:VNx2BI 112) 0) This problem feels somewhat related to how we might handle https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151. Bootstrapped & regtested on aarch64-linux-gnu. Thanks, Spencer Spencer Abson (14): aarch64: Extend iterator support for partial SVE FP modes aarch64: Add support for unpacked SVE FP conversions aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions aarch64: Add support for unpacked SVE FP comparisons aarch64: Compare/and splits for unpacked SVE FP comparisons aarch64: Add support for unpacked SVE FP unary operations aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary operations aarch64: Add support for unpacked SVE FP binary arithmetic aarch64: Add support for unpacked SVE FDIV aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary arithmetic aarch64: Add support for unpacked SVE FP conditional binary arithmetic aarch64: Add support for unpacked SVE FP ternary arithmetic aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic aarch64: Add support for unpacked SVE FP conditional ternary arithmetic gcc/config/aarch64/aarch64-protos.h | 4 + gcc/config/aarch64/aarch64-sve.md | 889 ++++++++++++------ gcc/config/aarch64/aarch64-sve2.md | 10 +- gcc/config/aarch64/aarch64.cc | 125 ++- gcc/config/aarch64/iterators.md | 97 +- gcc/config/aarch64/predicates.md | 4 + .../aarch64/sve/unpacked_binary_bf16_1.C | 35 + .../aarch64/sve/unpacked_binary_bf16_2.C | 15 + .../aarch64/sve/unpacked_cond_binary_bf16_1.C | 46 + .../aarch64/sve/unpacked_cond_binary_bf16_2.C | 18 + .../sve/unpacked_cond_ternary_bf16_1.C | 35 + .../sve/unpacked_cond_ternary_bf16_2.C | 14 + .../aarch64/sve/unpacked_ternary_bf16_1.C | 27 + .../aarch64/sve/unpacked_ternary_bf16_2.C | 11 + .../aarch64/sve/pack_fcvt_signed_1.c | 2 +- .../aarch64/sve/pack_fcvt_unsigned_1.c | 2 +- .../gcc.target/aarch64/sve/pack_float_1.c | 2 +- .../gcc.target/aarch64/sve/unpack_float_1.c | 2 +- .../aarch64/sve/unpacked_builtin_fmax_1.c | 40 + .../aarch64/sve/unpacked_builtin_fmax_2.c | 16 + .../aarch64/sve/unpacked_builtin_fmin_1.c | 40 + .../aarch64/sve/unpacked_builtin_fmin_2.c | 16 + .../sve/unpacked_cond_builtin_fmax_1.c | 47 + .../sve/unpacked_cond_builtin_fmax_2.c | 20 + .../sve/unpacked_cond_builtin_fmin_1.c | 47 + .../sve/unpacked_cond_builtin_fmin_2.c | 20 + .../aarch64/sve/unpacked_cond_cvtf_1.c | 47 + .../aarch64/sve/unpacked_cond_fabs_1.c | 32 + .../aarch64/sve/unpacked_cond_fadd_1.c | 58 ++ .../aarch64/sve/unpacked_cond_fadd_2.c | 24 + .../aarch64/sve/unpacked_cond_fcvt_1.c | 37 + .../aarch64/sve/unpacked_cond_fcvtz_1.c | 51 + .../aarch64/sve/unpacked_cond_fdiv_1.c | 43 + .../aarch64/sve/unpacked_cond_fdiv_2.c | 18 + .../aarch64/sve/unpacked_cond_fmaxnm_1.c | 49 + .../aarch64/sve/unpacked_cond_fmaxnm_2.c | 20 + .../aarch64/sve/unpacked_cond_fminnm_1.c | 49 + .../aarch64/sve/unpacked_cond_fminnm_2.c | 20 + .../aarch64/sve/unpacked_cond_fmla_1.c | 47 + .../aarch64/sve/unpacked_cond_fmla_2.c | 18 + .../aarch64/sve/unpacked_cond_fmls_1.c | 47 + .../aarch64/sve/unpacked_cond_fmls_2.c | 18 + .../aarch64/sve/unpacked_cond_fmul_1.c | 46 + .../aarch64/sve/unpacked_cond_fmul_2.c | 18 + .../aarch64/sve/unpacked_cond_fneg_1.c | 34 + .../aarch64/sve/unpacked_cond_fnmla_1.c | 47 + .../aarch64/sve/unpacked_cond_fnmla_2.c | 18 + .../aarch64/sve/unpacked_cond_fnmls_1.c | 47 + .../aarch64/sve/unpacked_cond_fnmls_2.c | 18 + .../aarch64/sve/unpacked_cond_frinta_1.c | 32 + .../aarch64/sve/unpacked_cond_frinti_1.c | 32 + .../aarch64/sve/unpacked_cond_frintm_1.c | 32 + .../aarch64/sve/unpacked_cond_frintp_1.c | 32 + .../aarch64/sve/unpacked_cond_frintx_1.c | 32 + .../aarch64/sve/unpacked_cond_frintz_1.c | 32 + .../aarch64/sve/unpacked_cond_fsubr_1.c | 53 ++ .../aarch64/sve/unpacked_cond_fsubr_2.c | 22 + .../gcc.target/aarch64/sve/unpacked_cvtf_1.c | 217 +++++ .../gcc.target/aarch64/sve/unpacked_cvtf_2.c | 23 + .../gcc.target/aarch64/sve/unpacked_cvtf_3.c | 12 + .../gcc.target/aarch64/sve/unpacked_fabs_1.c | 24 + .../gcc.target/aarch64/sve/unpacked_fadd_1.c | 48 + .../gcc.target/aarch64/sve/unpacked_fadd_2.c | 22 + .../gcc.target/aarch64/sve/unpacked_fcm_1.c | 547 +++++++++++ .../gcc.target/aarch64/sve/unpacked_fcm_2.c | 47 + .../aarch64/sve/unpacked_fcm_and_1.c | 18 + .../gcc.target/aarch64/sve/unpacked_fcvt_1.c | 118 +++ .../gcc.target/aarch64/sve/unpacked_fcvt_2.c | 16 + .../gcc.target/aarch64/sve/unpacked_fcvtz_1.c | 244 +++++ .../gcc.target/aarch64/sve/unpacked_fcvtz_2.c | 26 + .../gcc.target/aarch64/sve/unpacked_fdiv_1.c | 34 + .../gcc.target/aarch64/sve/unpacked_fdiv_2.c | 11 + .../gcc.target/aarch64/sve/unpacked_fdiv_3.c | 11 + .../aarch64/sve/unpacked_fmaxnm_1.c | 41 + .../aarch64/sve/unpacked_fmaxnm_2.c | 16 + .../aarch64/sve/unpacked_fminnm_1.c | 42 + .../aarch64/sve/unpacked_fminnm_2.c | 16 + .../gcc.target/aarch64/sve/unpacked_fmla_1.c | 34 + .../gcc.target/aarch64/sve/unpacked_fmla_2.c | 11 + .../gcc.target/aarch64/sve/unpacked_fmls_1.c | 34 + .../gcc.target/aarch64/sve/unpacked_fmls_2.c | 11 + .../gcc.target/aarch64/sve/unpacked_fmul_1.c | 39 + .../gcc.target/aarch64/sve/unpacked_fmul_2.c | 14 + .../gcc.target/aarch64/sve/unpacked_fneg_1.c | 26 + .../gcc.target/aarch64/sve/unpacked_fnmla_1.c | 34 + .../gcc.target/aarch64/sve/unpacked_fnmla_2.c | 11 + .../gcc.target/aarch64/sve/unpacked_fnmls_1.c | 34 + .../gcc.target/aarch64/sve/unpacked_fnmls_2.c | 11 + .../aarch64/sve/unpacked_frinta_1.c | 27 + .../aarch64/sve/unpacked_frinta_2.c | 11 + .../aarch64/sve/unpacked_frinti_1.c | 27 + .../aarch64/sve/unpacked_frinti_2.c | 11 + .../aarch64/sve/unpacked_frintm_1.c | 27 + .../aarch64/sve/unpacked_frintm_2.c | 11 + .../aarch64/sve/unpacked_frintp_1.c | 27 + .../aarch64/sve/unpacked_frintp_2.c | 11 + .../aarch64/sve/unpacked_frintx_1.c | 27 + .../aarch64/sve/unpacked_frintx_2.c | 11 + .../aarch64/sve/unpacked_frintz_1.c | 27 + .../aarch64/sve/unpacked_frintz_2.c | 11 + .../gcc.target/aarch64/sve/unpacked_fsubr_1.c | 42 + .../gcc.target/aarch64/sve/unpacked_fsubr_2.c | 16 + 102 files changed, 4371 insertions(+), 364 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_2.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_2.C create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fabs_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvt_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvtz_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fneg_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinta_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinti_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintp_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintx_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintz_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fabs_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_and_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fneg_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_2.c -- 2.34.1