This patch series completes support for SME2 and SME2p1 intrinsics relative to
modal 8bit floating point types.

- The first patch in the series introduces tests for using luti intrinsics with
  mf8 that was already working since their introduction, now that their use is
  documented in ACLE.
- The second patch extends the definitions of existing non-interpreting sve2/sme
  intrinsics to support mfloat8 types.
- The third and fourth patches add widening and narrowing sme2 fp8 conversions
  respectively (svcvt).
- The fifth patch adds multi-vector floating-point adjust exponent intrinsics
  (svscale).
- The sixth patch adds support for the sme-f8f16 and sme-f8f32 arch features
  and related defines.
- Patch 7 adds Multi-vector 8-bit floating-point multiply-add long intrinsics.
- Patch 8 adds 8-bit floating-point sum of outer products and accumulate
  intrinsics.
- Patch 9 adds 8-bit floating point dot product intrinsics.

Compared to version 1 of this patch series:
- updated commit messages per requests.
- fixed gating of intrinsics in patch four (narrowing sme2 conversions to fp8).
- introduced aarch64_output_asm_with_extra_operand function and updated insns in
  aarch64-sme.md to no longer use out of bounds operands.

Compared to version 2 of this patch series:
- replaced aarch64_output_asm_with_extra_operand with
  aarch64_output_asm_with_offset which does not require allocating space for
  operands on the stack in patch 7.

Regression tested on aarch64-unknown-linux-gnu.

OK to merge?

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (8):
  aarch64: add tests for sme mfloat8 luti functions
  aarch64: extend sme intrinsics to mfp8
  aarch64: add widening sme2 fp8 conversions
  aarch64: add narrowing sme2 conversions to fp8
  aarch64: add multi-vector floating-point adjust exponent intrinsics
  aarch64: add basic support for sme-f8f16 and sme-f8f32
  aarch64: add Multi-vector 8-bit floating-point multiply-add long
  aarch64: add 8-bit floating-point sum of outer products and accumulate

Karl Meakin (1):
  aarch64: add 8-bit floating point dot product

 gcc/config/aarch64/aarch64-c.cc               |   4 +
 .../aarch64/aarch64-option-extensions.def     |   4 +
 gcc/config/aarch64/aarch64-protos.h           |   1 +
 gcc/config/aarch64/aarch64-sme.md             | 572 ++++++++++++++++++
 .../aarch64/aarch64-sve-builtins-base.cc      |  49 +-
 .../aarch64/aarch64-sve-builtins-functions.h  |  23 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc    |  43 +-
 .../aarch64/aarch64-sve-builtins-shapes.h     |   1 +
 .../aarch64/aarch64-sve-builtins-sme.cc       |  20 +-
 .../aarch64/aarch64-sve-builtins-sme.def      |  55 +-
 gcc/config/aarch64/aarch64-sve-builtins-sme.h |   2 +
 .../aarch64/aarch64-sve-builtins-sve2.cc      |   2 +
 .../aarch64/aarch64-sve-builtins-sve2.def     |  12 +
 .../aarch64/aarch64-sve-builtins-sve2.h       |   2 +
 gcc/config/aarch64/aarch64-sve-builtins.cc    |  34 +-
 gcc/config/aarch64/aarch64-sve2.md            |  52 +-
 gcc/config/aarch64/aarch64.cc                 |   8 +
 gcc/config/aarch64/aarch64.h                  |  10 +
 gcc/config/aarch64/iterators.md               |  73 ++-
 gcc/doc/invoke.texi                           |   6 +
 .../aarch64/sme2/aarch64-sme2-acle-asm.exp    |   5 +-
 .../gcc.target/aarch64/pragma_cpp_predefs_4.c |  34 ++
 .../aarch64/sme/acle-asm/read_hor_za128.c     |  31 +
 .../aarch64/sme/acle-asm/read_hor_za8.c       |  31 +
 .../aarch64/sme/acle-asm/read_ver_za128.c     |  31 +
 .../aarch64/sme/acle-asm/read_ver_za8.c       |  31 +
 .../aarch64/sme/acle-asm/revd_mf8.c           |  76 +++
 .../aarch64/sme/acle-asm/test_sme_acle.h      |   2 +-
 .../aarch64/sme/acle-asm/write_hor_za128.c    |  10 +
 .../aarch64/sme/acle-asm/write_hor_za8.c      |  10 +
 .../aarch64/sme/acle-asm/write_ver_za128.c    |  10 +
 .../aarch64/sme/acle-asm/write_ver_za8.c      |  10 +
 .../aarch64/sme2/aarch64-sme2-acle-asm.exp    |   5 +-
 .../aarch64/sme2/acle-asm/cvt_mf8.c           |  47 ++
 .../aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c   |  56 ++
 .../aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c    |  56 ++
 .../aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c    |  72 +++
 .../aarch64/sme2/acle-asm/cvtl_mf8.c          |  47 ++
 .../aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c   |  72 +++
 .../sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c   | 119 ++++
 .../sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c   | 125 ++++
 .../sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c   | 119 ++++
 .../sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c   | 125 ++++
 .../sme2/acle-asm/dot_single_za16_mf8_vg1x2.c | 126 ++++
 .../sme2/acle-asm/dot_single_za16_mf8_vg1x4.c | 126 ++++
 .../sme2/acle-asm/dot_single_za32_mf8_vg1x2.c | 126 ++++
 .../sme2/acle-asm/dot_single_za32_mf8_vg1x4.c | 126 ++++
 .../sme2/acle-asm/dot_za16_mf8_vg1x2.c        | 150 +++++
 .../sme2/acle-asm/dot_za16_mf8_vg1x4.c        | 166 +++++
 .../sme2/acle-asm/dot_za32_mf8_vg1x2.c        | 150 +++++
 .../sme2/acle-asm/dot_za32_mf8_vg1x4.c        | 166 +++++
 .../aarch64/sme2/acle-asm/ld1_mf8_x2.c        | 262 ++++++++
 .../aarch64/sme2/acle-asm/ld1_mf8_x4.c        | 354 +++++++++++
 .../aarch64/sme2/acle-asm/ldnt1_mf8_x2.c      | 262 ++++++++
 .../aarch64/sme2/acle-asm/ldnt1_mf8_x4.c      | 354 +++++++++++
 .../aarch64/sme2/acle-asm/luti2_mf8.c         |  48 ++
 .../aarch64/sme2/acle-asm/luti2_mf8_x2.c      |  50 ++
 .../aarch64/sme2/acle-asm/luti2_mf8_x4.c      |  56 ++
 .../aarch64/sme2/acle-asm/luti4_mf8.c         |  48 ++
 .../aarch64/sme2/acle-asm/luti4_mf8_x2.c      |  50 ++
 .../sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c   | 167 +++++
 .../sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c   | 136 +++++
 .../sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c   | 142 +++++
 .../sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c   | 169 ++++++
 .../sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c   | 137 +++++
 .../sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c   | 143 +++++
 .../sme2/acle-asm/mla_za16_mf8_vg2x1.c        | 167 +++++
 .../sme2/acle-asm/mla_za16_mf8_vg2x2.c        | 285 +++++++++
 .../sme2/acle-asm/mla_za16_mf8_vg2x4.c        | 287 +++++++++
 .../sme2/acle-asm/mla_za32_mf8_vg4x1.c        | 167 +++++
 .../sme2/acle-asm/mla_za32_mf8_vg4x2.c        | 277 +++++++++
 .../sme2/acle-asm/mla_za32_mf8_vg4x4.c        | 289 +++++++++
 .../aarch64/sme2/acle-asm/mopa_za16_mf8.c     |  36 ++
 .../aarch64/sme2/acle-asm/mopa_za32_mf8.c     |  36 ++
 .../aarch64/sme2/acle-asm/read_hor_za8_vg2.c  |  78 +++
 .../aarch64/sme2/acle-asm/read_hor_za8_vg4.c  |  91 +++
 .../aarch64/sme2/acle-asm/read_ver_za8_vg2.c  |  78 +++
 .../aarch64/sme2/acle-asm/read_ver_za8_vg4.c  |  91 +++
 .../aarch64/sme2/acle-asm/read_za8_vg1x2.c    |  48 ++
 .../aarch64/sme2/acle-asm/read_za8_vg1x4.c    |  54 ++
 .../aarch64/sme2/acle-asm/readz_hor_za128.c   |  10 +
 .../aarch64/sme2/acle-asm/readz_hor_za8.c     |  10 +
 .../aarch64/sme2/acle-asm/readz_hor_za8_vg2.c |  78 +++
 .../aarch64/sme2/acle-asm/readz_hor_za8_vg4.c |  91 +++
 .../aarch64/sme2/acle-asm/readz_ver_za128.c   | 197 ++++++
 .../aarch64/sme2/acle-asm/readz_ver_za8.c     |  10 +
 .../aarch64/sme2/acle-asm/readz_ver_za8_vg2.c |  77 +++
 .../aarch64/sme2/acle-asm/readz_ver_za8_vg4.c |  90 +++
 .../aarch64/sme2/acle-asm/readz_za8_vg1x2.c   |  48 ++
 .../aarch64/sme2/acle-asm/readz_za8_vg1x4.c   |  56 ++
 .../aarch64/sme2/acle-asm/scale_f16_x2.c      | 192 ++++++
 .../aarch64/sme2/acle-asm/scale_f16_x4.c      | 229 +++++++
 .../aarch64/sme2/acle-asm/scale_f32_x2.c      | 208 +++++++
 .../aarch64/sme2/acle-asm/scale_f32_x4.c      | 229 +++++++
 .../aarch64/sme2/acle-asm/scale_f64_x2.c      | 208 +++++++
 .../aarch64/sme2/acle-asm/scale_f64_x4.c      | 229 +++++++
 .../aarch64/sme2/acle-asm/sel_mf8_x2.c        |  92 +++
 .../aarch64/sme2/acle-asm/sel_mf8_x4.c        |  92 +++
 .../aarch64/sme2/acle-asm/st1_mf8_x2.c        | 262 ++++++++
 .../aarch64/sme2/acle-asm/st1_mf8_x4.c        | 354 +++++++++++
 .../aarch64/sme2/acle-asm/stnt1_mf8_x2.c      | 262 ++++++++
 .../aarch64/sme2/acle-asm/stnt1_mf8_x4.c      | 354 +++++++++++
 .../aarch64/sme2/acle-asm/test_sme2_acle.h    |  12 +-
 .../aarch64/sme2/acle-asm/uzp_mf8_x2.c        |  77 +++
 .../aarch64/sme2/acle-asm/uzp_mf8_x4.c        |  73 +++
 .../aarch64/sme2/acle-asm/uzpq_mf8_x2.c       |  77 +++
 .../aarch64/sme2/acle-asm/uzpq_mf8_x4.c       |  73 +++
 .../sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c  | 119 ++++
 .../sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c | 119 ++++
 .../sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c | 119 ++++
 .../aarch64/sme2/acle-asm/write_hor_za8_vg2.c |  78 +++
 .../aarch64/sme2/acle-asm/write_hor_za8_vg4.c |  91 +++
 .../aarch64/sme2/acle-asm/write_ver_za8_vg2.c |  78 +++
 .../aarch64/sme2/acle-asm/write_ver_za8_vg4.c |  91 +++
 .../aarch64/sme2/acle-asm/write_za8_vg1x2.c   |  48 ++
 .../aarch64/sme2/acle-asm/write_za8_vg1x4.c   |  54 ++
 .../aarch64/sme2/acle-asm/zip_mf8_x2.c        |  77 +++
 .../aarch64/sme2/acle-asm/zip_mf8_x4.c        |  73 +++
 .../aarch64/sme2/acle-asm/zipq_mf8_x2.c       |  77 +++
 .../aarch64/sme2/acle-asm/zipq_mf8_x4.c       |  73 +++
 .../aarch64/sve/acle/asm/test_sve_acle.h      |   3 +
 .../sve/acle/general-c/binary_za_m_1.c        |  14 +
 .../acle/general-c/binary_za_slice_lane_1.c   |  14 +
 .../general-c/binary_za_slice_opt_single_1.c  |  16 +
 .../general-c/dot_half_za_slice_lane_fpm.c    | 106 ++++
 .../aarch64/sve2/acle/asm/ld1_mf8_x2.c        | 269 ++++++++
 .../aarch64/sve2/acle/asm/ld1_mf8_x4.c        | 361 +++++++++++
 .../aarch64/sve2/acle/asm/ldnt1_mf8_x2.c      | 269 ++++++++
 .../aarch64/sve2/acle/asm/ldnt1_mf8_x4.c      | 361 +++++++++++
 .../aarch64/sve2/acle/asm/revd_mf8.c          |  80 +++
 .../aarch64/sve2/acle/asm/stnt1_mf8_x2.c      | 269 ++++++++
 .../aarch64/sve2/acle/asm/stnt1_mf8_x4.c      | 361 +++++++++++
 gcc/testsuite/lib/target-supports.exp         |   1 +
 133 files changed, 14461 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/acle-asm/revd_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_bf16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvt_mf8_f32_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtl_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/cvtn_mf8_f32_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ld1_mf8_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/ldnt1_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti2_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/luti4_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za16_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/mopa_za32_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/readz_ver_za128.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f16_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f32_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/scale_f64_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/sel_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/st1_mf8_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/stnt1_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_mf8_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zip_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/zipq_mf8_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/dot_half_za_slice_lane_fpm.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ld1_mf8_x4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_mf8_x4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/revd_mf8.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_mf8_x4.c

-- 
2.51.0

Reply via email to