Changes from v1: Add future_compile_ok and lp64 testsuite directives to the testcases.
This patch depends on: Support for Dense Math Registers: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713352.html This patch adds support for Dense Math Facility (DMF) instructions and MMA+ (Matrix-Multiply Assist Plus) instructions which may be available in a future Power processor. The changes extend the existing MMA infrastructure to support the new DMR (Dense Math Register) operations. Key changes: 1. Extended MMA operand support from 7 to 9 operands (MAX_MMA_OPERANDS) 2. Added new DMF-specific unspecs for DMR operations: - UNSPEC_DM_INSERT1024 for 1024-bit DMR insertions - UNSPEC_DMF_DMXOR for DMR XOR operations - UNSPEC_DMF_DMXVI8GERX4* for DMR GER (outer product) operations - UNSPEC_DMF_PMDMXVI8GERX4* for prefixed DMR GER operations 3. Implemented new instruction patterns in mma.md: - dm_insert1024: Insert 1024-bit data into DMR - dmf_dmxor: XOR operation on DMR registers - dmf_dmxvi8gerx4/dmxvi8gerx4pp: DMR outer product operations - dmf_pmdmxvi8gerx4/pmdmxvi8gerx4pp: Prefixed DMR outer product operations 4. Added new predicate for DMR register handling: - dense_math_operand: Validates DMR register operands 5. Extended builtin infrastructure: - Added 10 new DMF builtins (DMSETDMRZ, DMMR, DMXOR, BUILD_DMR, etc.) - Added dm, dmint, and dmr attributes for builtin classification - Updated gimple folding to handle DMR pass-by-reference semantics - Extended builtin expansion to support 8 and 9 operand instructions 6. Added ISA support: - New "dmf" ISA attribute for Dense Math instructions - New "dmf" instruction type for scheduling - TARGET_DENSE_MATH feature flag support The implementation follows the existing MMA pattern where user-facing builtins use pass-by-reference for accumulator/DMR arguments, while internal builtins use pass-by-value for optimization. 2026-04-11 Peter Bergner <[email protected]> Surya Kumari Jangala <[email protected]> gcc: * config/rs6000/mma.md (MAX_MMA_OPERANDS): Increase from 7 to 9. (UNSPEC_DMF_INSERT1024): New unspec. (UNSPEC_DMF_DMXOR): Likewise. (UNSPEC_DMF_DMXVI8GERX4): Likewise. (UNSPEC_DMF_DMXVI8GERX4PP): Likewise. (UNSPEC_DMF_PMDMXVI8GERX4): Likewise. (UNSPEC_DMF_PMDMXVI8GERX4PP): Likewise. (DMF_PV): New int iterator. (DMF_DPV): Likewise. (DMF_PVI8I4I4): Likewise. (DMF_DPVI8I4I4): Likewise. (pv): Update attribute. (apv): Likewise. (pvi8i4i4): New DMF attribute. (dpvi8i4i4): Likewise. (dm_insert1024): New insn pattern. (mma_build_dmr): New define_expand. (mma_dmsetdmrz): New insn. (*movtdo): Call gen_dm_insert1024(). (movtdo_insert512_upper): Delete insn. (movtdo_insert512_lower): Likewise. (dmf_dmxor): New insn pattern. (reload_tdo_from_memory): Rewrite to call gen_dm_insert1024(). (dmf_<pv>): New insn pattern. (dmf_<apv>): Likewise. (dmf_<pvi8i4i4>): Likewise. (dmf_<dpvi8i4i4>): Likewise. * config/rs6000/predicates.md (mma_disassemble_input_operand): New predicate. * config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Add support for ENB_DM. (rs6000_gimple_fold_mma_builtin): Add support for DMR builtins. Support 8-9 operand builtins. Handle DMR pass-by-reference semantics. (mma_expand_builtin): Add cases for 8 and 9 operands. (rs6000_expand_builtin): Add DMF builtin support. Increase MAX_BUILTIN_ARGS from 6 to 8. * config/rs6000/rs6000-builtins.def: Add [dm] stanza with 8 new DMF builtins: DMSETDMRZ, DMMR, DMXOR, BUILD_DMR, DMXVI8GERX4, DMXVI8GERX4PP, PMDMXVI8GERX4, PMDMXVI8GERX4PP and their internal variants. * config/rs6000/rs6000-gen-builtins.cc (bif_stanza): Add BSTZ_DM. (stanza_map): Add "dm" entry. (enable_string): Add "ENB_DM". (basetype): Add BT_DMR. (attrinfo): Add isdm, isdmint, isdmr fields. (type_map): Add dm1024 and ptr_dm1024 mappings. (match_type): Handle dm1024 type. (parse_bif_attrs): Parse dm, dmint, dmr attributes. (complete_vector_type): Handle BT_DMR case. (write_decls): Add ENB_DM, bif_dm_bit, bif_dmint_bit, bif_dmr_bit. (write_bif_static_init): Handle dm attributes in builtin initialization. * config/rs6000/rs6000.md (type): Add dmf type. (isa): Add ftr, mma, dmf ISA attributes. (enabled): Add conditions for ftr, mma, dmf ISA support. * doc/extend.texi (PowerPC Dense Math Facility Built-in Functions): Add documentation for Dense Math Facility (DMF) builtins available on future ISA. (PowerPC Matrix-Multiply Assist Built-in Functions): Add documentation for MMA+ builtins available on future ISA. gcc/testsuite: * gcc.target/powerpc/dmf-build-dmr.c: New test. * gcc.target/powerpc/dmf-builtin.c: New test. --- gcc/config/rs6000/mma.md | 190 +++++++++++++++--- gcc/config/rs6000/predicates.md | 13 +- gcc/config/rs6000/rs6000-builtin.cc | 87 +++++--- gcc/config/rs6000/rs6000-builtins.def | 56 ++++++ gcc/config/rs6000/rs6000-gen-builtins.cc | 63 +++++- gcc/config/rs6000/rs6000.md | 16 +- gcc/doc/extend.texi | 38 ++++ .../gcc.target/powerpc/dmf-build-dmr.c | 15 ++ .../gcc.target/powerpc/dmf-builtin.c | 81 ++++++++ 9 files changed, 495 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/dmf-build-dmr.c create mode 100644 gcc/testsuite/gcc.target/powerpc/dmf-builtin.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index e4e613c55bf..7ad6ad07647 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -24,7 +24,7 @@ ;; __vector_pair types that the MMA built-in functions reference. We ;; use OPAQUE_MODE to prevent anything from trying to open them up. -(define_constants [(MAX_MMA_OPERANDS 7)]) +(define_constants [(MAX_MMA_OPERANDS 9)]) ;; Constants for creating unspecs @@ -93,9 +93,15 @@ (define_c_enum "unspec" UNSPEC_MMA_DMSETDMRZ UNSPEC_DM_INSERT512_UPPER UNSPEC_DM_INSERT512_LOWER + UNSPEC_DMF_INSERT1024 UNSPEC_DM_EXTRACT512 UNSPEC_DM_RELOAD_FROM_MEMORY UNSPEC_DM_RELOAD_TO_MEMORY + UNSPEC_DMF_DMXOR + UNSPEC_DMF_DMXVI8GERX4 + UNSPEC_DMF_DMXVI8GERX4PP + UNSPEC_DMF_PMDMXVI8GERX4 + UNSPEC_DMF_PMDMXVI8GERX4PP ]) (define_c_enum "unspecv" @@ -138,12 +144,18 @@ (define_int_iterator MMA_AVV [UNSPEC_MMA_XVI4GER8PP ;; MMA instructions with 1 vector pair and 1 vector arguments (define_int_iterator MMA_PV [UNSPEC_MMA_XVF64GER]) +; DMF instructions with 1 vector pair and 1 vector arguments +(define_int_iterator DMF_PV [UNSPEC_DMF_DMXVI8GERX4]) + ;; MMA instructions with 1 accumulator, 1 vector pair and 1 vector arguments (define_int_iterator MMA_APV [UNSPEC_MMA_XVF64GERPP UNSPEC_MMA_XVF64GERPN UNSPEC_MMA_XVF64GERNP UNSPEC_MMA_XVF64GERNN]) +;; DMF instructions with 1 dmr, 1 vector pair and 1 vector arguments +(define_int_iterator DMF_DPV [UNSPEC_DMF_DMXVI8GERX4PP]) + ;; MMA instructions with 2 vector, 2 4-bit and 1 8-bit arguments (define_int_iterator MMA_VVI4I4I8 [UNSPEC_MMA_PMXVI4GER8]) @@ -193,6 +205,14 @@ (define_int_iterator MMA_VVI4I4I4 [UNSPEC_MMA_PMXVI8GER4]) (define_int_iterator MMA_AVVI4I4I4 [UNSPEC_MMA_PMXVI8GER4PP UNSPEC_MMA_PMXVI8GER4SPP]) +; DMF instructions with 1 vector pair, 1 vector and 1 8-bit and 2 4-bit +;; arguments +(define_int_iterator DMF_PVI8I4I4 [UNSPEC_DMF_PMDMXVI8GERX4]) + +;; DMF instructions with 1dmr, 1 vector pair, 1 vector and 1 8-bit and +;; 2 4-bit arguments +(define_int_iterator DMF_DPVI8I4I4 [UNSPEC_DMF_PMDMXVI8GERX4PP]) + (define_int_attr acc [(UNSPEC_MMA_XXMFACC "xxmfacc") (UNSPEC_MMA_XXMTACC "xxmtacc")]) @@ -222,12 +242,14 @@ (define_int_attr avv [(UNSPEC_MMA_XVI4GER8PP "xvi4ger8pp") (UNSPEC_MMA_XVF32GERNP "xvf32gernp") (UNSPEC_MMA_XVF32GERNN "xvf32gernn")]) -(define_int_attr pv [(UNSPEC_MMA_XVF64GER "xvf64ger")]) +(define_int_attr pv [(UNSPEC_MMA_XVF64GER "xvf64ger") + (UNSPEC_DMF_DMXVI8GERX4 "dmxvi8gerx4")]) (define_int_attr apv [(UNSPEC_MMA_XVF64GERPP "xvf64gerpp") (UNSPEC_MMA_XVF64GERPN "xvf64gerpn") (UNSPEC_MMA_XVF64GERNP "xvf64gernp") - (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) + (UNSPEC_MMA_XVF64GERNN "xvf64gernn") + (UNSPEC_DMF_DMXVI8GERX4PP "dmxvi8gerx4pp")]) (define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) @@ -268,6 +290,10 @@ (define_int_attr vvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4 "pmxvi8ger4")]) (define_int_attr avvi4i4i4 [(UNSPEC_MMA_PMXVI8GER4PP "pmxvi8ger4pp") (UNSPEC_MMA_PMXVI8GER4SPP "pmxvi8ger4spp")]) +(define_int_attr pvi8i4i4 [(UNSPEC_DMF_PMDMXVI8GERX4 "pmdmxvi8gerx4")]) + +(define_int_attr dpvi8i4i4 [(UNSPEC_DMF_PMDMXVI8GERX4PP "pmdmxvi8gerx4pp")]) + ;; Vector pair support. OOmode can only live in VSRs. (define_expand "movoo" @@ -439,6 +465,18 @@ (define_insn "dm_insert512" "dmxxinstdmr512 %0,%x1,%x2,%3" [(set_attr "type" "mma")]) +(define_insn "dm_insert1024" + [(set (match_operand:TDO 0 "dense_math_operand" "=wD") + (unspec:TDO [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa") + (match_operand:OO 3 "vsx_register_operand" "wa") + (match_operand:OO 4 "vsx_register_operand" "wa")] + UNSPEC_DMF_INSERT1024))] + "TARGET_DENSE_MATH" + "dmxxinstdmr512 %0,%x1,%x2,0\n\tdmxxinstdmr512 %0,%x3,%x4,1" + [(set_attr "type" "mma")]) + + (define_expand "mma_assemble_acc" [(match_operand:XO 0 "fpr_reg_operand") (match_operand:V16QI 1 "mma_assemble_input_operand") @@ -503,6 +541,31 @@ (define_expand "mma_disassemble_acc" DONE; }) + +(define_expand "dmf_build_dmr" + [(match_operand:TDO 0 "dense_math_operand") + (match_operand:V16QI 1 "mma_assemble_input_operand") + (match_operand:V16QI 2 "mma_assemble_input_operand") + (match_operand:V16QI 3 "mma_assemble_input_operand") + (match_operand:V16QI 4 "mma_assemble_input_operand") + (match_operand:V16QI 5 "mma_assemble_input_operand") + (match_operand:V16QI 6 "mma_assemble_input_operand") + (match_operand:V16QI 7 "mma_assemble_input_operand") + (match_operand:V16QI 8 "mma_assemble_input_operand")] + "TARGET_DENSE_MATH" +{ + rtx vp0 = gen_reg_rtx (OOmode); + rtx vp1 = gen_reg_rtx (OOmode); + rtx vp2 = gen_reg_rtx (OOmode); + rtx vp3 = gen_reg_rtx (OOmode); + emit_insn (gen_vsx_assemble_pair (vp0, operands[1], operands[2])); + emit_insn (gen_vsx_assemble_pair (vp1, operands[3], operands[4])); + emit_insn (gen_vsx_assemble_pair (vp2, operands[5], operands[6])); + emit_insn (gen_vsx_assemble_pair (vp3, operands[7], operands[8])); + emit_insn (gen_dm_insert1024 (operands[0], vp0, vp1, vp2, vp3)); + DONE; +}) + ;; If dense math registers are not available, MMA instructions that do ;; not use their accumulators that overlap with FPR registers as an ;; input, still must not allow their vector operands to overlap the @@ -553,6 +616,14 @@ (define_insn "mma_xxsetaccz_dm" "dmsetdmrz %A0" [(set_attr "type" "mma")]) + (define_insn "dmf_dmsetdmrz" + [(set (match_operand:TDO 0 "accumulator_operand" "=wD") + (unspec [(const_int 0)] + UNSPEC_MMA_DMSETDMRZ))] + "TARGET_DENSE_MATH" + "dmsetdmrz %A0" + [(set_attr "type" "mma")]) + ;; MMA operations below. If dense math registers are available, these ;; operations will use the 8 accumultors which are separate registers. @@ -804,10 +875,11 @@ (define_insn_and_split "*movtdo" if (DM_REGNO_P (regno0) && VSX_REGNO_P (regno1)) { - rtx op1_upper = gen_rtx_REG (XOmode, regno1); - rtx op1_lower = gen_rtx_REG (XOmode, regno1 + 4); - emit_insn (gen_movtdo_insert512_upper (op0, op1_upper)); - emit_insn (gen_movtdo_insert512_lower (op0, op0, op1_lower)); + rtx pair0 = gen_rtx_REG (OOmode, regno1); + rtx pair1 = gen_rtx_REG (OOmode, regno1 + 2); + rtx pair2 = gen_rtx_REG (OOmode, regno1 + 4); + rtx pair3 = gen_rtx_REG (OOmode, regno1 + 6); + emit_insn (gen_dm_insert1024 (op0, pair0, pair1, pair2, pair3)); DONE; } @@ -831,25 +903,17 @@ (define_insn_and_split "*movtdo" (set_attr "length" "*,*,32,8,*,8") (set_attr "max_prefixed_insns" "4,4,*,*,*,*")]) -;; Move from VSX registers to dense math registers via two insert 512 bit -;; instructions. -(define_insn "movtdo_insert512_upper" - [(set (match_operand:TDO 0 "dense_math_operand" "=wD") - (unspec:TDO [(match_operand:XO 1 "vsx_register_operand" "wa")] - UNSPEC_DM_INSERT512_UPPER))] - "TARGET_DENSE_MATH" - "dmxxinstdmr512 %0,%1,%Y1,0" - [(set_attr "type" "mma")]) -(define_insn "movtdo_insert512_lower" +(define_insn "dmf_dmxor" [(set (match_operand:TDO 0 "dense_math_operand" "=wD") - (unspec:TDO [(match_operand:TDO 1 "dense_math_operand" "0") - (match_operand:XO 2 "vsx_register_operand" "wa")] - UNSPEC_DM_INSERT512_LOWER))] + (unspec:TDO [(match_operand:TDO 1 "dense_math_operand" "0") + (match_operand:TDO 2 "dense_math_operand" "wD")] + UNSPEC_DMF_DMXOR))] "TARGET_DENSE_MATH" - "dmxxinstdmr512 %0,%2,%Y2,1" + "dmxor %0,%1,%2" [(set_attr "type" "mma")]) + ;; Move from dense math registers to VSX registers via two extract 512 bit ;; instructions. (define_insn "movtdo_extract512" @@ -868,24 +932,38 @@ (define_insn_and_split "reload_tdo_from_memory" UNSPEC_DM_RELOAD_FROM_MEMORY)) (clobber (match_operand:XO 2 "vsx_register_operand" "=wa"))] "TARGET_DENSE_MATH" + "#" "&& reload_completed" [(const_int 0)] { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp = operands[2]; - rtx mem_upper = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 0 : 64); - rtx mem_lower = adjust_address (src, XOmode, BYTES_BIG_ENDIAN ? 64 : 0); + rtx pair0 = operands[2]; + rtx pair1 = operands[3]; + rtx pair2 = operands[4]; + rtx pair3 = operands[5]; + + if (BYTES_BIG_ENDIAN) + { + emit_move_insn (pair0, adjust_address (src, OOmode, 0)); + emit_move_insn (pair1, adjust_address (src, OOmode, 32)); + emit_move_insn (pair2, adjust_address (src, OOmode, 64)); + emit_move_insn (pair3, adjust_address (src, OOmode, 96)); + } + else + { + emit_move_insn (pair3, adjust_address (src, OOmode, 0)); + emit_move_insn (pair2, adjust_address (src, OOmode, 32)); + emit_move_insn (pair1, adjust_address (src, OOmode, 64)); + emit_move_insn (pair0, adjust_address (src, OOmode, 96)); + } - emit_move_insn (tmp, mem_upper); - emit_insn (gen_movtdo_insert512_upper (dest, tmp)); - emit_move_insn (tmp, mem_lower); - emit_insn (gen_movtdo_insert512_lower (dest, dest, tmp)); + emit_insn (gen_dm_insert1024 (dest, pair0, pair1, pair2, pair3)); DONE; } - [(set_attr "length" "16") + [(set_attr "length" "20") (set_attr "max_prefixed_insns" "2") (set_attr "type" "vecload")]) @@ -915,3 +993,57 @@ (define_insn_and_split "reload_tdo_to_memory" } [(set_attr "length" "16") (set_attr "max_prefixed_insns" "2")]) + +(define_insn "dmf_<pv>" + [(set (match_operand:TDO 0 "accumulator_operand" "=wD") + (unspec:TDO [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa")] + DMF_PV))] + "TARGET_DENSE_MATH" +{ + return "<pv> %0,%x1,%x2"; +} + [(set_attr "type" "dmf")]) + +(define_insn "dmf_<apv>" + [(set (match_operand:TDO 0 "accumulator_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "accumulator_operand" "0") + (match_operand:OO 2 "vsx_register_operand" "wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa")] + DMF_DPV))] + "TARGET_DENSE_MATH" +{ + return "<apv> %0,%x2,%x3"; +} + [(set_attr "type" "dmf")]) + +(define_insn "dmf_<pvi8i4i4>" + [(set (match_operand:TDO 0 "accumulator_operand" "=wD") + (unspec:TDO [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa") + (match_operand:SI 3 "u8bit_cint_operand" "n") + (match_operand:SI 4 "const_0_to_15_operand" "n") + (match_operand:SI 5 "const_0_to_15_operand" "n")] + DMF_PVI8I4I4))] + "TARGET_DENSE_MATH" +{ + return "<pvi8i4i4> %0,%x1,%x2,%3,%4,%5"; +} + [(set_attr "type" "dmf") + (set_attr "prefixed" "yes")]) + +(define_insn "dmf_<dpvi8i4i4>" + [(set (match_operand:TDO 0 "accumulator_operand" "=wD") + (unspec:TDO [(match_operand:TDO 1 "accumulator_operand" "0") + (match_operand:OO 2 "vsx_register_operand" "wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa") + (match_operand:SI 4 "u8bit_cint_operand" "n") + (match_operand:SI 5 "const_0_to_15_operand" "n") + (match_operand:SI 6 "const_0_to_15_operand" "n")] + DMF_DPVI8I4I4))] + "TARGET_DENSE_MATH" +{ + return "<dpvi8i4i4> %0,%x2,%x3,%4,%5,%6"; +} + [(set_attr "type" "dmf") + (set_attr "prefixed" "yes")]) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 5de81d54507..03157706a7a 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1408,7 +1408,18 @@ (define_special_predicate "mma_assemble_input_operand" && (indexed_or_indirect_address (XEXP (op, 0), mode) || quad_address_p (XEXP (op, 0), mode, false)))))")) -;; Return 1 if this operand is valid for an MMA disassemble insn. +;; Return 1 if this input operand is valid for an MMA disassemble insn. +(define_predicate "mma_disassemble_input_operand" + (match_code "reg") +{ + if (TARGET_DENSE_MATH) + return vsx_register_operand (op, mode); + else if (TARGET_MMA) + return fpr_reg_operand (op, mode); + return 0; +}) + +;; Return 1 if this output operand is valid for an MMA disassemble insn. (define_predicate "mma_disassemble_output_operand" (match_code "reg,subreg,mem") { diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 88e5b3997f9..64224b7697a 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -194,6 +194,8 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins fncode) return TARGET_HTM; case ENB_MMA: return TARGET_MMA; + case ENB_DM: + return TARGET_DENSE_MATH; default: gcc_unreachable (); } @@ -1084,7 +1086,8 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, gimple *stmt = gsi_stmt (*gsi); size_t fncode = (size_t) fn_code; - if (!bif_is_mma (rs6000_builtin_info[fncode])) + if (!bif_is_mma (rs6000_builtin_info[fncode]) + && !bif_is_dm (rs6000_builtin_info[fncode])) return false; /* Each call that can be gimple-expanded has an associated built-in @@ -1092,11 +1095,11 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, already expanded it! Exceptions: lxvp and stxvp. */ if (rs6000_builtin_info[fncode].assoc_bif == RS6000_BIF_NONE && fncode != RS6000_BIF_LXVP - && fncode != RS6000_BIF_STXVP) + && fncode != RS6000_BIF_STXVP + && fncode != RS6000_BIF_DMMR) return false; bifdata *bd = &rs6000_builtin_info[fncode]; - unsigned nopnds = bd->nargs; gimple_seq new_seq = NULL; gimple *new_call; tree new_decl; @@ -1213,27 +1216,49 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, /* Convert this built-in into an internal version that uses pass-by-value arguments. The internal built-in is found in the assoc_bif field. */ - new_decl = rs6000_builtin_decls[rs6000_builtin_info[fncode].assoc_bif]; + size_t new_fncode = rs6000_builtin_info[fncode].assoc_bif; + new_decl = rs6000_builtin_decls[new_fncode]; tree lhs, op[MAX_MMA_OPERANDS]; + tree lhs_type = NULL_TREE; tree acc = gimple_call_arg (stmt, 0); push_gimplify_context (true); - if (bif_is_quad (*bd)) + switch (insn_data[rs6000_builtin_info[new_fncode].icode].operand[0].mode) { - /* This built-in has a pass-by-reference accumulator input, so load it - into a temporary accumulator for use as a pass-by-value input. */ - op[0] = make_ssa_name (vector_quad_type_node); - for (unsigned i = 1; i < nopnds; i++) - op[i] = gimple_call_arg (stmt, i); - gimplify_assign (op[0], build_simple_mem_ref (acc), &new_seq); + case TDOmode: + lhs_type = dm1024_type_node; + break; + case XOmode: + lhs_type = vector_quad_type_node; + break; + case OOmode: + lhs_type = vector_pair_type_node; + break; + default: + gcc_unreachable (); } - else - { - /* This built-in does not use its pass-by-reference accumulator argument - as an input argument, so remove it from the input list. */ - nopnds--; - for (unsigned i = 0; i < nopnds; i++) - op[i] = gimple_call_arg (stmt, i + 1); + + unsigned nopnds = 0; + for (int i = 0; i < bd->nargs; i++) + { + tree arg = gimple_call_arg (stmt, i); + if (i == 0 && !bif_is_dmr (*bd) && !bif_is_quad (*bd)) + continue; + /* If this is another DMR operand, it is passed in by reference. + The internal built-ins use pass-by-value, so load this operand + into a variable and pass that in as our operand. */ + if (POINTER_TYPE_P (TREE_TYPE (arg)) + && TREE_TYPE (TREE_TYPE (arg)) == lhs_type) + { + tree op_mem = build_simple_mem_ref (build1 (NOP_EXPR, + TREE_TYPE (arg), + arg)); + op[nopnds] = make_ssa_name (lhs_type); + gimplify_assign (op[nopnds], op_mem, &new_seq); + } + else + op[nopnds] = arg; + nopnds++; } switch (nopnds) @@ -1265,14 +1290,19 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3], op[4], op[5], op[6]); break; + case 8: + new_call = gimple_build_call (new_decl, 8, op[0], op[1], op[2], op[3], + op[4], op[5], op[6], op[7]); + break; + case 9: + new_call = gimple_build_call (new_decl, 9, op[0], op[1], op[2], op[3], + op[4], op[5], op[6], op[7], op[8]); + break; default: gcc_unreachable (); } - if (fncode == RS6000_BIF_BUILD_PAIR || fncode == RS6000_BIF_ASSEMBLE_PAIR_V) - lhs = make_ssa_name (vector_pair_type_node); - else - lhs = make_ssa_name (vector_quad_type_node); + lhs = make_ssa_name (lhs_type); gimple_call_set_lhs (new_call, lhs); gimple_seq_add_stmt (&new_seq, new_call); gimplify_assign (build_simple_mem_ref (acc), lhs, &new_seq); @@ -2989,6 +3019,14 @@ mma_expand_builtin (tree exp, rtx target, insn_code icode, case 7: pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]); break; + case 8: + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6], + op[7]); + break; + case 9: + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6], + op[7], op[8]); + break; default: gcc_unreachable (); } @@ -3425,7 +3463,7 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, /* Position of first argument (0 for void-returning functions, else 1). */ int k; /* Modes for the return value, if any, and arguments. */ - const int MAX_BUILTIN_ARGS = 6; + const int MAX_BUILTIN_ARGS = 8; machine_mode mode[MAX_BUILTIN_ARGS + 1]; if (void_func) @@ -3560,7 +3598,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* subtarget */, if (bif_is_lxvrze (*bifaddr)) return lxvrze_expand_builtin (target, icode, op, mode[0], mode[1]); - if (bif_is_mma (*bifaddr)) + if (bif_is_mma (*bifaddr) + || bif_is_dm (*bifaddr)) return mma_expand_builtin (exp, target, icode, fcode); if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 7e5a4fb96e7..513c69e48b8 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -137,6 +137,8 @@ ; endian Needs special handling for endianness ; ibmld Restrict usage to the case when TFmode is IBM-128 ; ibm128 Restrict usage to the case where __ibm128 is supported or if ibmld +; dm Restrict usage to dense math +; dmr MMA instruction using a dmr register as an input operand ; ; Each attribute corresponds to extra processing required when ; the built-in is expanded. All such special processing should @@ -3924,3 +3926,57 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} + +[dm] + void __builtin_dmsetdmrz (dm1024 *); + DMSETDMRZ nothing {dm,dmint} + + dm1024 __builtin_dmsetdmrz_internal (); + DMSETDMRZ_INTERNAL dmf_dmsetdmrz {dm} + + void __builtin_dmmr (dm1024 *, dm1024 *); + DMMR nothing {dm,dmint} + + dm1024 __builtin_dmmr_internal (dm1024); + DMMR_INTERNAL movtdo {dm} + + void __builtin_dmxor (dm1024 *, dm1024 *); + DMXOR nothing {dm,dmint,dmr} + + dm1024 __builtin_dmxor_internal (dm1024, dm1024); + DMXOR_INTERNAL dmf_dmxor {dm} + + void __builtin_build_dmr (dm1024 *, vuc, vuc, vuc, vuc, vuc, vuc, vuc, vuc); + BUILD_DMR nothing {dm,dmint} + + dm1024 __builtin_build_dmr_internal (vuc, vuc, vuc, vuc, vuc, vuc, vuc, vuc); + BUILD_DMR_INTERNAL dmf_build_dmr {dm} + + void __builtin_mma_dmxvi8gerx4 (dm1024 *, v256, vuc); + DMXVI8GERX4 nothing {dm,dmint} + + dm1024 __builtin_mma_dmxvi8gerx4_internal (v256, vuc); + DMXVI8GERX4_INTERNAL dmf_dmxvi8gerx4 {dm} + + void __builtin_mma_dmxvi8gerx4pp (dm1024 *, v256, vuc); + DMXVI8GERX4PP nothing {dm,dmint,dmr} + + dm1024 __builtin_mma_dmxvi8gerx4pp_internal (dm1024, v256, vuc); + DMXVI8GERX4PP_INTERNAL dmf_dmxvi8gerx4pp {dm} + + void __builtin_mma_pmdmxvi8gerx4 (dm1024 *, v256, vuc, const int<8>, \ + const int<4>, const int<4>); + PMDMXVI8GERX4 nothing {dm,pair,dmint} + + dm1024 __builtin_mma_pmdmxvi8gerx4_internal (v256, vuc, const int<8>, \ + const int<4>, const int<4>); + PMDMXVI8GERX4_INTERNAL dmf_pmdmxvi8gerx4 {dm,pair} + + void __builtin_mma_pmdmxvi8gerx4pp (dm1024 *, v256, vuc, const int<8>, \ + const int<4>, const int<4>); + PMDMXVI8GERX4PP nothing {dm,pair,dmint,dmr} + + dm1024 __builtin_mma_pmdmxvi8gerx4pp_internal (dm1024, v256, vuc, \ + const int<8>, const int<4>, \ + const int<4>); + PMDMXVI8GERX4PP_INTERNAL dmf_pmdmxvi8gerx4pp {dm,pair} diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc b/gcc/config/rs6000/rs6000-gen-builtins.cc index c7ae5899c5c..cd7d8f6f12f 100644 --- a/gcc/config/rs6000/rs6000-gen-builtins.cc +++ b/gcc/config/rs6000/rs6000-gen-builtins.cc @@ -94,6 +94,9 @@ along with GCC; see the file COPYING3. If not see ibmld Restrict usage to the case when TFmode is IBM-128 ibm128 Restrict usage to the case where __ibm128 is supported or if ibmld + dm Needs special handling for DMF/MMA+ instructions + dmint DMF/MMA+ instruction expanding to internal call at GIMPLE time + dmr MMA+ instruction using a dmr register as an input operand An example stanza might look like this: @@ -232,6 +235,7 @@ enum bif_stanza BSTZ_P10, BSTZ_P10_64, BSTZ_MMA, + BSTZ_DM, NUMBIFSTANZAS }; @@ -265,7 +269,8 @@ static stanza_entry stanza_map[NUMBIFSTANZAS] = { "htm", BSTZ_HTM }, { "power10", BSTZ_P10 }, { "power10-64", BSTZ_P10_64 }, - { "mma", BSTZ_MMA } + { "mma", BSTZ_MMA }, + { "dm", BSTZ_DM } }; static const char *enable_string[NUMBIFSTANZAS] = @@ -290,7 +295,8 @@ static const char *enable_string[NUMBIFSTANZAS] = "ENB_HTM", "ENB_P10", "ENB_P10_64", - "ENB_MMA" + "ENB_MMA", + "ENB_DM" }; /* Function modifiers provide special handling for const, pure, and fpmath @@ -324,7 +330,8 @@ enum basetype BT_DECIMAL128, BT_IBM128, BT_VPAIR, - BT_VQUAD + BT_VQUAD, + BT_DMR }; /* Ways in which a const int value can be restricted. RES_BITS indicates @@ -392,6 +399,9 @@ struct attrinfo bool isendian; bool isibmld; bool isibm128; + bool isdm; + bool isdmint; + bool isdmr; }; /* Fields associated with a function prototype (bif or overload). */ @@ -543,6 +553,7 @@ static typemap type_map[] = { "pv16qi", "ptr_V16QI" }, { "pv1poi", "ptr_vector_pair" }, { "pv1pxi", "ptr_vector_quad" }, + { "pv1tdoi", "ptr_dm1024" }, { "pv1ti", "ptr_V1TI" }, { "pv2df", "ptr_V2DF" }, { "pv2di", "ptr_V2DI" }, @@ -573,6 +584,7 @@ static typemap type_map[] = { "v16qi", "V16QI" }, { "v1poi", "vector_pair" }, { "v1pxi", "vector_quad" }, + { "v1tdoi", "dm1024" }, { "v1ti", "V1TI" }, { "v2df", "V2DF" }, { "v2di", "V2DI" }, @@ -1058,6 +1070,7 @@ match_type (typeinfo *typedata, int voidok) vd vector double v256 __vector_pair v512 __vector_quad + dm1024 __dmr For simplicity, We don't support "short int" and "long long int". We don't currently support a <basetype> of "_Float16". "signed" @@ -1239,6 +1252,13 @@ match_type (typeinfo *typedata, int voidok) handle_pointer (typedata); return 1; } + else if (!strcmp (token, "dm1024")) + { + typedata->isvector = 1; + typedata->base = BT_DMR; + handle_pointer (typedata); + return 1; + } else if (!strcmp (token, "signed")) typedata->issigned = 1; else if (!strcmp (token, "unsigned")) @@ -1437,6 +1457,12 @@ parse_bif_attrs (attrinfo *attrptr) attrptr->isibmld = 1; else if (!strcmp (attrname, "ibm128")) attrptr->isibm128 = 1; + else if (!strcmp (attrname, "dm")) + attrptr->isdm = 1; + else if (!strcmp (attrname, "dmint")) + attrptr->isdmint = 1; + else if (!strcmp (attrname, "dmr")) + attrptr->isdmr = 1; else { diag (oldpos, "unknown attribute.\n"); @@ -1470,14 +1496,15 @@ parse_bif_attrs (attrinfo *attrptr) "pred = %d, htm = %d, htmspr = %d, htmcr = %d, mma = %d, " "quad = %d, pair = %d, mmaint = %d, no32bit = %d, 32bit = %d, " "cpu = %d, ldstmask = %d, lxvrse = %d, lxvrze = %d, endian = %d, " - "ibmdld = %d, ibm128 = %d.\n", + "ibmdld = %d, ibm128 = %d, dm = %d, dmint = %d, dmr = %d.\n", attrptr->isextract, attrptr->isnosoft,attrptr->isldvec, attrptr->isstvec, attrptr->isreve, attrptr->ispred, attrptr->ishtm, attrptr->ishtmspr, attrptr->ishtmcr, attrptr->ismma, attrptr->isquad, attrptr->ispair, attrptr->ismmaint, attrptr->isno32bit, attrptr->is32bit, attrptr->iscpu, attrptr->isldstmask, attrptr->islxvrse, attrptr->islxvrze, - attrptr->isendian, attrptr->isibmld, attrptr->isibm128); + attrptr->isendian, attrptr->isibmld, attrptr->isibm128, + attrptr->isdm, attrptr->isdmint, attrptr->isdmr); #endif return PC_OK; @@ -1538,6 +1565,10 @@ complete_vector_type (typeinfo *typeptr, char *buf, int *bufi) memcpy (&buf[*bufi], "1pxi", 4); *bufi += 4; break; + case BT_DMR: + memcpy (&buf[*bufi], "1tdoi", 5); + *bufi += 5; + break; default: diag (pos, "unhandled basetype %d.\n", typeptr->base); exit (1); @@ -2249,7 +2280,8 @@ write_decls (void) fprintf (header_file, " ENB_HTM,\n"); fprintf (header_file, " ENB_P10,\n"); fprintf (header_file, " ENB_P10_64,\n"); - fprintf (header_file, " ENB_MMA\n"); + fprintf (header_file, " ENB_MMA,\n"); + fprintf (header_file, " ENB_DM\n"); fprintf (header_file, "};\n\n"); fprintf (header_file, "#define PPC_MAXRESTROPNDS 3\n"); @@ -2291,6 +2323,9 @@ write_decls (void) fprintf (header_file, "#define bif_endian_bit\t\t(0x00200000)\n"); fprintf (header_file, "#define bif_ibmld_bit\t\t(0x00400000)\n"); fprintf (header_file, "#define bif_ibm128_bit\t\t(0x00800000)\n"); + fprintf (header_file, "#define bif_dm_bit\t\t(0x02000000)\n"); + fprintf (header_file, "#define bif_dmint_bit\t\t(0x04000000)\n"); + fprintf (header_file, "#define bif_dmr_bit\t\t(0x08000000)\n"); fprintf (header_file, "\n"); fprintf (header_file, "#define bif_is_extract(x)\t((x).bifattrs & bif_extract_bit)\n"); @@ -2336,6 +2371,12 @@ write_decls (void) "#define bif_is_ibmld(x)\t((x).bifattrs & bif_ibmld_bit)\n"); fprintf (header_file, "#define bif_is_ibm128(x)\t((x).bifattrs & bif_ibm128_bit)\n"); + fprintf (header_file, + "#define bif_is_dm(x)\t((x).bifattrs & bif_dm_bit)\n"); + fprintf (header_file, + "#define bif_is_dmint(x)\t((x).bifattrs & bif_dmint_bit)\n"); + fprintf (header_file, + "#define bif_is_dmr(x)\t((x).bifattrs & bif_dmr_bit)\n"); fprintf (header_file, "\n"); fprintf (header_file, @@ -2535,6 +2576,12 @@ write_bif_static_init (void) fprintf (init_file, " | bif_ibmld_bit"); if (bifp->attrs.isibm128) fprintf (init_file, " | bif_ibm128_bit"); + if (bifp->attrs.isdm) + fprintf (init_file, " | bif_dm_bit"); + if (bifp->attrs.isdmint) + fprintf (init_file, " | bif_dmint_bit"); + if (bifp->attrs.isdmr) + fprintf (init_file, " | bif_dmr_bit"); fprintf (init_file, ",\n"); fprintf (init_file, " /* restr_opnd */\t{%d, %d, %d},\n", bifp->proto.restr_opnd[0], bifp->proto.restr_opnd[1], @@ -2568,8 +2615,8 @@ write_bif_static_init (void) : (bifp->kind == FNK_FPMATH ? "= fp, const" : "")))); fprintf (init_file, " /* assoc_bif */\tRS6000_BIF_%s%s\n", - bifp->attrs.ismmaint ? bifp->idname : "NONE", - bifp->attrs.ismmaint ? "_INTERNAL" : ""); + (bifp->attrs.ismmaint || bifp->attrs.isdmint) ? bifp->idname : "NONE", + (bifp->attrs.ismmaint || bifp->attrs.isdmint) ? "_INTERNAL" : ""); fprintf (init_file, " },\n"); } fprintf (init_file, " };\n\n"); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 57a239791ee..6f34a12cbf8 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -223,7 +223,7 @@ (define_attr "type" vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm, vecfloat,vecfdiv,vecdouble,mtvsr,mfvsr,crypto, veclogical,veccmpfx,vecexts,vecmove, - htm,htmsimple,dfp,mma, + htm,htmsimple,dfp,mma,dmf, fused_arith_logical, fused_cmp_isel, fused_carry, @@ -371,7 +371,7 @@ (define_attr "cpu" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8,p8v,p9,p9v,p9kf,p9tf,p10,ftr,mma,dmf" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -423,6 +423,18 @@ (define_attr "enabled" "" (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "ftr") + (match_test "TARGET_FUTURE")) + (const_int 1) + + (and (eq_attr "isa" "mma") + (match_test "TARGET_MMA")) + (const_int 1) + + (and (eq_attr "isa" "dmf") + (match_test "TARGET_DENSE_MATH")) + (const_int 1) ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index eb057fb7c40..7e438940c5b 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -18572,6 +18572,7 @@ instructions, but allow the compiler to schedule those calls. * PowerPC Hardware Transactional Memory Built-in Functions:: * PowerPC Atomic Memory Operation Functions:: * PowerPC Matrix-Multiply Assist Built-in Functions:: +* PowerPC Dense Math Facility Built-in Functions:: * PRU Built-in Functions:: * RISC-V Built-in Functions:: * RISC-V Vector Intrinsics:: @@ -27095,6 +27096,43 @@ __vector_pair __builtin_vsx_lxvp (size_t, __vector_pair *); void __builtin_vsx_stxvp (__vector_pair, size_t, __vector_pair *); @end smallexample +Future ISA of PowerPC may add new Matrix-Multiply Assist Plus(MMA+) +instructions. GCC provides support for these instructions through the +following built-in functions which are enabled with the @code{-mmma} option. +The vec_t type below is defined to be a normal vector unsigned char type. +The uint2, uint4 and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned +integer constants respectively. The compiler will verify that they are +constants and that their values are within range. The __dm1024 type is a +1024 bit integer type. + +The built-in functions supported are: + +@smallexample +void __builtin_mma_dmxvi8gerx4 (__dm1024 *, __vector_pair, vec_t); +void __builtin_mma_dmxvi8gerx4pp (__dm1024 *, __vector_pair, vec_t); + +void __builtin_mma_pmdmxvi8gerx4 (__dm1024 *, __vector_pair, vec_t, uint8, uint4, uint4); +void __builtin_mma_pmdmxvi8gerx4pp (__dm1024 *, __vector_pair, vec_t, uint8, uint4, uint4); +@end smallexample + +@node PowerPC Dense Math Facility Built-in Functions +@subsection PowerPC Dense Math Facility Built-in Functions + +A future PowerPC processor may provide Dense Math Facility (DMF) +instructions. GCC provides support for these instructions through the +following built-in functions which are enabled with the @code{-mdense-math} +option. The vec_t type below is defined to be a normal vector unsigned char +type. The __dm1024 type is a 1024 bit integer type. + +The built-in functions supported are: + +@smallexample +void __builtin_dmsetdmrz (__dm1024 *); +void __builtin_dmmr (__dm1024 *, __dm1024 *); +void __builtin_dmxor (__dm1024 *, __dm1024 *); +void __builtin_build_dmr (__dm1024 *, vec_t, vec_t, vec_t, vec_t, vec_t, vec_t, vec_t, vec_t); +@end smallexample + @node PRU Built-in Functions @subsection PRU Built-in Functions diff --git a/gcc/testsuite/gcc.target/powerpc/dmf-build-dmr.c b/gcc/testsuite/gcc.target/powerpc/dmf-build-dmr.c new file mode 100644 index 00000000000..ca4feca9b06 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dmf-build-dmr.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_future_compile_ok } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +typedef unsigned char vec_t __attribute__((vector_size(16))); + +void +foo2 (__dm1024 *dst, vec_t *src) +{ + __builtin_build_dmr (dst, src[0], src[1], src[2], src[3], src[4], src[5], src[6], src[7]); +} + +/* { dg-final { scan-assembler-times {\mdmxxinstdmr512\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mdmxxextfdmr512\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/dmf-builtin.c b/gcc/testsuite/gcc.target/powerpc/dmf-builtin.c new file mode 100644 index 00000000000..fb1b1116bf3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dmf-builtin.c @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_future_compile_ok } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +typedef unsigned char vec_t __attribute__((vector_size(16))); + +void +foo (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __dm1024 dmr; + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_dmsetdmrz (&dmr); + __builtin_mma_dmxvi8gerx4 (&dmr, vp, vec); + *dst = dmr; +} + +void +bar (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __dm1024 dmr = dst[0];; + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_mma_dmxvi8gerx4 (&dmr, vp, vec); + dst[1] = dmr; +} + +/* { dg-final { scan-assembler-times {\mdmxvi8gerx4\M} 2 } } */ + +void +foo_1 (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __dm1024 dmr; + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_dmsetdmrz (&dmr); + __builtin_mma_dmxvi8gerx4pp (&dmr, vp, vec); + *dst = dmr; +} + +void +bar_1 (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __dm1024 dmr = dst[0];; + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_mma_dmxvi8gerx4pp (&dmr, vp, vec); + dst[1] = dmr; +} + +/* { dg-final { scan-assembler-times {\mdmxvi8gerx4pp\M} 2 } } */ + +void +foo_2 (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_mma_pmdmxvi8gerx4 (dst, vp, vec, 255, 15, 2); +} + +/* { dg-final { scan-assembler-times {\mpmdmxvi8gerx4\M} 1 } } */ + +void +foo_3 (__dm1024 *dst, __vector_pair *vpp, vec_t *src) +{ + __vector_pair vp = *vpp; + vec_t vec = *src; + __builtin_mma_pmdmxvi8gerx4pp (dst, vp, vec, 255, 15, 2); +} + +/* { dg-final { scan-assembler-times {\mpmdmxvi8gerx4pp\M} 1 } } */ + + +void +foo_5 (__dm1024 *dst, __dm1024 *src) +{ + __builtin_dmxor (dst, src); +} + +/* { dg-final { scan-assembler-times {\mdmxor\M} 1 } } */ -- 2.52.0
