This series incrementally adds support for operations on unpacked vectors
of floating-point values.  By "unpacked", we're referring to the in-register
layout of partial SVE vector modes.  For example, the elements of a VNx4HF
are stored as:

... | X | HF | X | HF | X | HF | X | HF |

Where 'X' denotes the undefined upper half of the 32-bit container that each
16-bit value is stored in.  This padding must not affect the operation's
behavior, so should not be interpreted if the operation may trap.

The series is organised as follows:
        * NFCs to iterators.md that lay the groundwork for the rest of the
        series.
        * Unpacked conversions, in which a solution to the issue described
        above is given.
        * Unpacked comparisons, which are slightly less trivial than...
        * Unpacked unary/binary/ternary operations, each of which is broken
        down into:
                * Defining the unconditional expansion
                * Supporting OP/UNSPEC_SEL combiner patterns under
                SVE_RELAXED_GP
                * Defining the conditional expander (if applicable)

This allows each change to aarch64-sve.md to be testable; once the conditional
expander for an operation is defined, the rules in match.pd canonicalize any
occurrence of that operation combined with a VEC_COND_EXPR into these
conditional forms, which would make the SVE_RELAXED_GP patterns dead at trunk.
I’ve taken this approach because I believe it’s valuable to have these
patterns to fall back on.

Notes on code generation under -ftrapping-math:

1) In the example below, we're currently unable to remove (1) in favour of
(2).

ptrue   p6.b, all   (1)
ptrue   p7.d, all   (2)
ld1w    z30.d, p6/z, [x1]
ld1w    z29.d, p6/z, [x3]
fsub    z30.s, p7/m, z30.s, #1.0

In the expanded RTL, the predicate source of the LD1Ws is a
(subreg:VNx2BI (reg:VNx16BI 111) 0), where every bit of 111 is a 1.  The
predicate source of the FSUB is a (subreg:VNx4BI (reg:VNx16BI 112) 0),
where every 8th bit of 112 is a 1, and the rest are 0.

2) The AND emitted by the conditional expander typically follows a CMP<CC>
operation, where it is trivially redundant.

cmpne   p5.d, p7/z, z0.d, #0
ptrue   p6.d, vl32
and p6.b, p6/z, p5.b, p5.b

The fold we need here is slightly different from what the existing
*cmp<cmp_op><mode>_and splitting patterns achieve, in that we don’t need to
replace p7 with p6 to make the AND redundant.

The AND in this case has the structure:

(set (reg:VNx4BI 113)
    (and (subreg:VNx4BI (reg:VNx16BI 111) 0)
         (subreg:VNx4BI (reg:VNx2BI 112) 0)

This problem feels somewhat related to how we might handle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118151.


Bootstrapped & regtested on aarch64-linux-gnu.

Thanks,
Spencer

Spencer Abson (14):
  aarch64: Extend iterator support for partial SVE FP modes
  aarch64: Add support for unpacked SVE FP conversions
  aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions
  aarch64: Add support for unpacked SVE FP comparisons
  aarch64: Compare/and splits for unpacked SVE FP comparisons
  aarch64: Add support for unpacked SVE FP unary operations
  aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary
    operations
  aarch64: Add support for unpacked SVE FP binary arithmetic
  aarch64: Add support for unpacked SVE FDIV
  aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary
    arithmetic
  aarch64: Add support for unpacked SVE FP conditional binary arithmetic
  aarch64: Add support for unpacked SVE FP ternary arithmetic
  aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary
    arithmetic
  aarch64: Add support for unpacked SVE FP conditional ternary
    arithmetic

 gcc/config/aarch64/aarch64-protos.h           |   4 +
 gcc/config/aarch64/aarch64-sve.md             | 889 ++++++++++++------
 gcc/config/aarch64/aarch64-sve2.md            |  10 +-
 gcc/config/aarch64/aarch64.cc                 | 125 ++-
 gcc/config/aarch64/iterators.md               |  97 +-
 gcc/config/aarch64/predicates.md              |   4 +
 .../aarch64/sve/unpacked_binary_bf16_1.C      |  35 +
 .../aarch64/sve/unpacked_binary_bf16_2.C      |  15 +
 .../aarch64/sve/unpacked_cond_binary_bf16_1.C |  46 +
 .../aarch64/sve/unpacked_cond_binary_bf16_2.C |  18 +
 .../sve/unpacked_cond_ternary_bf16_1.C        |  35 +
 .../sve/unpacked_cond_ternary_bf16_2.C        |  14 +
 .../aarch64/sve/unpacked_ternary_bf16_1.C     |  27 +
 .../aarch64/sve/unpacked_ternary_bf16_2.C     |  11 +
 .../aarch64/sve/pack_fcvt_signed_1.c          |   2 +-
 .../aarch64/sve/pack_fcvt_unsigned_1.c        |   2 +-
 .../gcc.target/aarch64/sve/pack_float_1.c     |   2 +-
 .../gcc.target/aarch64/sve/unpack_float_1.c   |   2 +-
 .../aarch64/sve/unpacked_builtin_fmax_1.c     |  40 +
 .../aarch64/sve/unpacked_builtin_fmax_2.c     |  16 +
 .../aarch64/sve/unpacked_builtin_fmin_1.c     |  40 +
 .../aarch64/sve/unpacked_builtin_fmin_2.c     |  16 +
 .../sve/unpacked_cond_builtin_fmax_1.c        |  47 +
 .../sve/unpacked_cond_builtin_fmax_2.c        |  20 +
 .../sve/unpacked_cond_builtin_fmin_1.c        |  47 +
 .../sve/unpacked_cond_builtin_fmin_2.c        |  20 +
 .../aarch64/sve/unpacked_cond_cvtf_1.c        |  47 +
 .../aarch64/sve/unpacked_cond_fabs_1.c        |  32 +
 .../aarch64/sve/unpacked_cond_fadd_1.c        |  58 ++
 .../aarch64/sve/unpacked_cond_fadd_2.c        |  24 +
 .../aarch64/sve/unpacked_cond_fcvt_1.c        |  37 +
 .../aarch64/sve/unpacked_cond_fcvtz_1.c       |  51 +
 .../aarch64/sve/unpacked_cond_fdiv_1.c        |  43 +
 .../aarch64/sve/unpacked_cond_fdiv_2.c        |  18 +
 .../aarch64/sve/unpacked_cond_fmaxnm_1.c      |  49 +
 .../aarch64/sve/unpacked_cond_fmaxnm_2.c      |  20 +
 .../aarch64/sve/unpacked_cond_fminnm_1.c      |  49 +
 .../aarch64/sve/unpacked_cond_fminnm_2.c      |  20 +
 .../aarch64/sve/unpacked_cond_fmla_1.c        |  47 +
 .../aarch64/sve/unpacked_cond_fmla_2.c        |  18 +
 .../aarch64/sve/unpacked_cond_fmls_1.c        |  47 +
 .../aarch64/sve/unpacked_cond_fmls_2.c        |  18 +
 .../aarch64/sve/unpacked_cond_fmul_1.c        |  46 +
 .../aarch64/sve/unpacked_cond_fmul_2.c        |  18 +
 .../aarch64/sve/unpacked_cond_fneg_1.c        |  34 +
 .../aarch64/sve/unpacked_cond_fnmla_1.c       |  47 +
 .../aarch64/sve/unpacked_cond_fnmla_2.c       |  18 +
 .../aarch64/sve/unpacked_cond_fnmls_1.c       |  47 +
 .../aarch64/sve/unpacked_cond_fnmls_2.c       |  18 +
 .../aarch64/sve/unpacked_cond_frinta_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_frinti_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_frintm_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_frintp_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_frintx_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_frintz_1.c      |  32 +
 .../aarch64/sve/unpacked_cond_fsubr_1.c       |  53 ++
 .../aarch64/sve/unpacked_cond_fsubr_2.c       |  22 +
 .../gcc.target/aarch64/sve/unpacked_cvtf_1.c  | 217 +++++
 .../gcc.target/aarch64/sve/unpacked_cvtf_2.c  |  23 +
 .../gcc.target/aarch64/sve/unpacked_cvtf_3.c  |  12 +
 .../gcc.target/aarch64/sve/unpacked_fabs_1.c  |  24 +
 .../gcc.target/aarch64/sve/unpacked_fadd_1.c  |  48 +
 .../gcc.target/aarch64/sve/unpacked_fadd_2.c  |  22 +
 .../gcc.target/aarch64/sve/unpacked_fcm_1.c   | 547 +++++++++++
 .../gcc.target/aarch64/sve/unpacked_fcm_2.c   |  47 +
 .../aarch64/sve/unpacked_fcm_and_1.c          |  18 +
 .../gcc.target/aarch64/sve/unpacked_fcvt_1.c  | 118 +++
 .../gcc.target/aarch64/sve/unpacked_fcvt_2.c  |  16 +
 .../gcc.target/aarch64/sve/unpacked_fcvtz_1.c | 244 +++++
 .../gcc.target/aarch64/sve/unpacked_fcvtz_2.c |  26 +
 .../gcc.target/aarch64/sve/unpacked_fdiv_1.c  |  34 +
 .../gcc.target/aarch64/sve/unpacked_fdiv_2.c  |  11 +
 .../gcc.target/aarch64/sve/unpacked_fdiv_3.c  |  11 +
 .../aarch64/sve/unpacked_fmaxnm_1.c           |  41 +
 .../aarch64/sve/unpacked_fmaxnm_2.c           |  16 +
 .../aarch64/sve/unpacked_fminnm_1.c           |  42 +
 .../aarch64/sve/unpacked_fminnm_2.c           |  16 +
 .../gcc.target/aarch64/sve/unpacked_fmla_1.c  |  34 +
 .../gcc.target/aarch64/sve/unpacked_fmla_2.c  |  11 +
 .../gcc.target/aarch64/sve/unpacked_fmls_1.c  |  34 +
 .../gcc.target/aarch64/sve/unpacked_fmls_2.c  |  11 +
 .../gcc.target/aarch64/sve/unpacked_fmul_1.c  |  39 +
 .../gcc.target/aarch64/sve/unpacked_fmul_2.c  |  14 +
 .../gcc.target/aarch64/sve/unpacked_fneg_1.c  |  26 +
 .../gcc.target/aarch64/sve/unpacked_fnmla_1.c |  34 +
 .../gcc.target/aarch64/sve/unpacked_fnmla_2.c |  11 +
 .../gcc.target/aarch64/sve/unpacked_fnmls_1.c |  34 +
 .../gcc.target/aarch64/sve/unpacked_fnmls_2.c |  11 +
 .../aarch64/sve/unpacked_frinta_1.c           |  27 +
 .../aarch64/sve/unpacked_frinta_2.c           |  11 +
 .../aarch64/sve/unpacked_frinti_1.c           |  27 +
 .../aarch64/sve/unpacked_frinti_2.c           |  11 +
 .../aarch64/sve/unpacked_frintm_1.c           |  27 +
 .../aarch64/sve/unpacked_frintm_2.c           |  11 +
 .../aarch64/sve/unpacked_frintp_1.c           |  27 +
 .../aarch64/sve/unpacked_frintp_2.c           |  11 +
 .../aarch64/sve/unpacked_frintx_1.c           |  27 +
 .../aarch64/sve/unpacked_frintx_2.c           |  11 +
 .../aarch64/sve/unpacked_frintz_1.c           |  27 +
 .../aarch64/sve/unpacked_frintz_2.c           |  11 +
 .../gcc.target/aarch64/sve/unpacked_fsubr_1.c |  42 +
 .../gcc.target/aarch64/sve/unpacked_fsubr_2.c |  16 +
 102 files changed, 4371 insertions(+), 364 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_1.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_binary_bf16_2.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_1.C
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/unpacked_ternary_bf16_2.C
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fabs_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fadd_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fcvtz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmla_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmls_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fmul_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fneg_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinta_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frinti_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintm_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintp_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintx_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_frintz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_cvtf_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fabs_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fadd_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_and_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvt_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcvtz_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fdiv_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmaxnm_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fminnm_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmla_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmls_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fmul_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fneg_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmla_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fnmls_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinta_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frinti_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintm_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintp_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintx_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_frintz_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/unpacked_fsubr_2.c

-- 
2.34.1

Reply via email to