Hi Saurabh, > On 6 Nov 2024, at 11:03, saurabh....@arm.com wrote: > > > The AArch64 FEAT_FP8 extension introduces instructions for conversion > and scaling. > > This patch introduces the following intrinsics: > 1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm. > 2. vcvt{q}_mf8_f16_fpm. > 3. vcvt_{high}_mf8_f32_fpm. > 4. vscale{q}_{f16|f32|f64}. > > We introduced three new aarch64_builtin_signatures enum variants: > 1. binary_fpm. > 2. ternary_fpm. > 3. unary_fpm. > > We added support for these variants for declaring types and for expanding to > RTL. > > We added new simd_types for integers (s32, s32q, and s64q) and for > fp8 (f8, and f8q). > > Also changed the faminmax intrinsic instruction pattern so that it works > better with the new fscale pattern. > > Because we added support for fp8 intrinsics here, we modified the check > in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not > defined. > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (enum class): New variants to support new signatures. > (aarch64_fntype): Handle new signatures. > (aarch64_expand_pragma_builtin): Handle new signatures. > * config/aarch64/aarch64-c.cc > (aarch64_update_cpp_builtins): New flag for FP8. > * config/aarch64/aarch64-simd-pragma-builtins.def > (ENTRY_BINARY_FPM): Macro to declare unary fpm intrinsics. > (ENTRY_TERNARY_FPM): Macro to declare ternary fpm intrinsics. > (ENTRY_UNARY_FPM): Macro to declare unary fpm intrinsics. > (ENTRY_VHSDF_VHSDI): Macro to declare binary intrinsics. > * config/aarch64/aarch64-simd.md > (@aarch64_<faminmax_uns_op><mode>): Renamed. > (@aarch64_<faminmax_uns_op><VHSDF:mode><VHSDF:mode>): Renamed. > (@aarch64_<fpm_uns_name><V8HFBF:mode><VB:mode>): Unary fpm > pattern. > (@aarch64_<fpm_uns_name><V8HFBF:mode><V16QI_ONLY:mode>): Unary > fpm pattern. > (@aarch64_<fpm_uns_name><VB:mode><VCVTFPM:mode><VH_SF:mode>): > Binary fpm pattern. > (@aarch64_<fpm_uns_name><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>): > Ternary fpm pattern. > (@aarch64_<fpm_uns_op><VHSDF:mode><VHSDI:mode>): Scale fpm > pattern. > * config/aarch64/iterators.md: New attributes and iterators. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature > macro doesn't exist. > * gcc.target/aarch64/simd/scale_fpm.c: New test. > * gcc.target/aarch64/simd/vcvt_fpm.c: New test. > > --- > > I could not find a way to compress declarations in > aarch64-simd-pragma-builtins.def for convert instructions as there was > no pattern apart from the repetion for vcvt1/vcvt2 types. Let me know > if those declrations can be expressed more concisely. > > In the scale instructions, I am not doing any casting from float to int > modes in the second operand. Let me know if that's a problem. > --- > gcc/config/aarch64/aarch64-builtins.cc | 132 ++++++++++-- > gcc/config/aarch64/aarch64-c.cc | 2 + > .../aarch64/aarch64-simd-pragma-builtins.def | 56 +++++ > gcc/config/aarch64/aarch64-simd.md | 72 ++++++- > gcc/config/aarch64/iterators.md | 99 +++++++++ > gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 10 - > .../gcc.target/aarch64/simd/scale_fpm.c | 60 ++++++ > .../gcc.target/aarch64/simd/vcvt_fpm.c | 197 ++++++++++++++++++ > 8 files changed, 603 insertions(+), 25 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/scale_fpm.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vcvt_fpm.c > > <0001-aarch64-Add-support-for-fp8-convert-and-scale.patch>
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cfe95bd4c31..87bbfb0e586 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -9982,13 +9982,13 @@ ) ;; faminmax -(define_insn "@aarch64_<faminmax_uns_op><mode>" +(define_insn "@aarch64_<faminmax_uns_op><VHSDF:mode><VHSDF:mode>" [(set (match_operand:VHSDF 0 "register_operand" "=w") (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") (match_operand:VHSDF 2 "register_operand" "w")] FAMINMAX_UNS))] "TARGET_FAMINMAX" - "<faminmax_uns_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" + "<faminmax_uns_op>\t%0.<Vtype>, %1.<VHSDF:Vtype>, %2.<VHSDF:Vtype>" ) (define_insn "*aarch64_faminmax_fused" @@ -9999,3 +9999,71 @@ "TARGET_FAMINMAX" "<faminmax_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" ) + +;; fpm unary instructions. +(define_insn "@aarch64_<fpm_uns_name><V8HFBF:mode><VB:mode>" + [(set (match_operand:V8HFBF 0 "register_operand" "=w") + (unspec:V8HFBF + [(match_operand:VB 1 "register_operand" "w") + (reg:DI FPM_REGNUM)] + FPM_UNARY_UNS))] + "TARGET_FP8" + "<fpm_uns_op>\t%0.<V8HFBF:Vtype>, %1.<VB:Vtype>" +) + +;; fpm unary instructions, where the input is lowered from V16QI to +;; V8QI. +(define_insn "@aarch64_<fpm_uns_name><V8HFBF:mode><V16QI_ONLY:mode>" + [(set (match_operand:V8HFBF 0 "register_operand" "=w") + (unspec:V8HFBF + [(match_operand:V16QI_ONLY 1 "register_operand" "w") + (reg:DI FPM_REGNUM)] + FPM_UNARY_LOW_UNS))] + "TARGET_FP8" + { + operands[1] = force_lowpart_subreg (V8QImode, + operands[1], + recog_data.operand[1]->mode); I don’t think this is needed? This code is only executed in the final assembly output stage and you already explicitly print operand 1 with a “.8b” suffix so changing the mode here doesn’t matter. + return "<fpm_uns_op>\t%0.<V8HFBF:Vtype>, %1.8b"; + } +) +;; fpm ternary instructions. +(define_insn + "@aarch64_<fpm_uns_name><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>" + [(set (match_operand:V16QI_ONLY 0 "register_operand" "=w") + (unspec:V16QI_ONLY + [(match_operand:V8QI_ONLY 1 "register_operand" "w") + (match_operand:V4SF_ONLY 2 "register_operand" "w") + (match_operand:V4SF_ONLY 3 "register_operand" "w") + (reg:DI FPM_REGNUM)] + FPM_TERNARY_VCVT_UNS))] + "TARGET_FP8" + { + operands[1] = force_reg (V16QImode, operands[1]); + return "<fpm_uns_op>\t%1.16b, %2.<V4SF_ONLY:Vtype>, %3.<V4SF_ONLY:Vtype>"; + } +) Same here. But more worryingly the destination operand 0 is not being printed out anywhere here. Was there supposed to be a tie of one of the input operands to operand 0 in this pattern? I haven’t looked deeply into what exactly these instructions do, but please double check the operands here. Thanks, Kyrill