Hi Mike,
Overall, I have the following comments regarding the patch:
1. MMA and Dense Math have to be decoupled. The future processor
can use dense math registers for non-MMA instructions too.
2. Use DMR wherever we want to reference a register.
3. Decide what we want TARGET_MMA to refer to. Will it also refer to
the new MMA instructions that may be added to the future processor?
On 14/11/25 1:25 pm, Michael Meissner wrote:
> The MMA subsystem added the notion of accumulator registers as an optional
> feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with
> the VSX registers 0..31, but logically the accumulator registers were separate
> from the FPR registers. In ISA 3.1, it was anticipated that in future
> systems,
> the accumulator registers may no overlap with the FPR registers. This patch
> adds the support for dense math registers as separate registers.
>
> This particular patch does not change the MMA support to use the accumulators
> within the dense math registers. This patch just adds the basic support for
> having separate DMRs. The next patch will switch the MMA support to use the
> accumulators if -mcpu=future is used.
>
> For testing purposes, I added an undocumented option '-mdense-math' to enable
> or disable the dense math support.
>
> This patch updates the wD constraint added in the previous patch. If MMA is
> selected but dense math is not selected (i.e. -mcpu=power10), the wD
> constraint
> will allow access to accumulators that overlap with VSX registers 0..31. If
> both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint
> will only allow dense math registers.
The future processor can use dense math registers for certain non-MMA
operations.
So the wD constraint should allow dense math registers even if MMA is not
selected.
The behaviour of the wD constraint should be solely determined by the
-mdense-math option.
>
> This patch modifies the existing %A output modifier. If MMA is selected but
> dense math is not selected, then %A output modifier converts the VSX register
> number to the accumulator number, by dividing it by 4. If both MMA and dense
> math are selected, then %A will map the separate DMF registers into 0..7.
Similarly, here too, the behaviour of %A output modifier should solely depend
on whether dense math is selected or not. For the future power processor, the
behaviour of %A should not depend on MMA.
And this begets the question: What exactly will TARGET_MMA mean for
-mcpu=future?
Will it mean only the MMA facility present in power10? Or will it also mean the
new MMA facility that may be present in a future processor?
>
> The intention is that user code using extended asm can be modified to run on
> both MMA without dense math and MMA with dense math:
>
> 1) If possible, don't use extended asm, but instead use the MMA
> built-in
> functions;
>
> 2) If you do need to write extended asm, change the d constraints
> targetting accumulators should now use wD;
>
> 3) Only use the built-in zero, assemble and disassemble functions
> create
> move data between vector quad types and dense math accumulators.
The above line ("...create move data...") is not clear. It needs rewriting.
> I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
> extended asm code.
The reason is these instructions assume there is a
> 1-to-1 correspondence between 4 adjacent FPR registers and an
> accumulator that overlaps with those instructions. With accumulators
> now being separate registers, there no longer is a 1-to-1
> correspondence.
>
> It is possible that the mangling for DMFs and the GDB register numbers may
> produce other changes in the future.
>
> I have built bootstrap GCC compilers on little endian and big endian
> PowerPC servers, and there were no regressions. Can I commit this
> patch to GCC 16 once the following patches have been applied?
>
> * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700539.html
> * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700540.html
> * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700542.html
>
> gcc/
>
> 2025-11-13 Michael Meissner <[email protected]>
>
> * config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec.
> (movxo): Add comments about dense math registers.
> (movxo_nodm): Rename from movxo and restrict the usage to machines
> without dense math registers.
> (movxo_dm): New insn for movxo support for machines with dense math
> registers.
> (mma_<acc>): Restrict usage to machines without dense math registers.
> (mma_xxsetaccz): Add a define_expand wrapper, and add support for dense
> math registers.
> (mma_dmsetaccz): New insn.
> * config/rs6000/predicates.md (dmf_operand): New predicate.
> (accumulator_operand): Add support for dense math registers.
> * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
> not issue a de-prime instruction when disassembling a vector quad on a
> system with dense math registers.
> * config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): Define
> __DENSE_MATH__ if we have dense math registers.
> * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): Add -mdense-math.
> (POWERPC_MASKS): Likewise.
> * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMF_REG_TYPE.
> (enum rs6000_reload_reg_type): Add RELOAD_REG_DMF.
> (LAST_RELOAD_REG_CLASS): Add support for DMF registers and the wD
> constraint.
> (reload_reg_map): Likewise.
> (rs6000_reg_names): Likewise.
> (alt_reg_names): Likewise.
> (rs6000_hard_regno_nregs_internal): Likewise.
> (rs6000_hard_regno_mode_ok_uncached): Likewise.
> (rs6000_debug_reg_global): Likewise.
> (rs6000_setup_reg_addr_masks): Likewise.
> (rs6000_init_hard_regno_mode_ok): Likewise.
> (rs6000_option_override_internal): If -mdense-math, issue an error if
> -mno-mma or not -mcpu=future.
> (rs6000_secondary_reload_memory): Add support for DMF registers.
> (rs6000_secondary_reload_simple_move): Likewise.
> (rs6000_preferred_reload_class): Likewise.
> (rs6000_secondary_reload_class): Likewise.
> (print_operand): Make %A handle both FPRs and DMRs.
> (rs6000_dmf_register_move_cost): New helper function.
> (rs6000_register_move_cost): Add support for DMR registers.
> (rs6000_memory_move_cost): Likewise.
> (rs6000_compute_pressure_classes): Likewise.
> (rs6000_debugger_regno): Likewise.
> (rs6000_opt_masks): Add -mdense-math support.
> (rs6000_split_multireg_move): Add support for DMRs.
> * config/rs6000/rs6000.h (TARGET_MMA_NO_DENSE_MATH): New macro.
> (UNITS_PER_DMF_WORD): Likewise.
> (FIRST_PSEUDO_REGISTER): Update for DMRs.
> (FIXED_REGISTERS): Add DMRs.
> (CALL_REALLY_USED_REGISTERS): Likewise.
> (REG_ALLOC_ORDER): Likewise.
> (DMF_REGNO_P): New macro.
> (enum reg_class): Add DM_REGS.
> (REG_CLASS_NAMES): Likewise.
> (REG_CLASS_CONTENTS): Likewise.
> (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
> (REGISTER_NAMES): Add DMF registers.
> (ADDITIONAL_REGISTER_NAMES): Likewise.
> * config/rs6000/rs6000.md (FIRST_DMF_REGNO): New constant.
> (LAST_DMF_REGNO): Likewise.
> * config/rs6000/rs6000.opt (-mdense-math): New option.
> ---
> gcc/config/rs6000/mma.md | 74 +++++++--
> gcc/config/rs6000/predicates.md | 21 ++-
> gcc/config/rs6000/rs6000-builtin.cc | 5 +-
> gcc/config/rs6000/rs6000-c.cc | 9 +-
> gcc/config/rs6000/rs6000-cpus.def | 2 +
> gcc/config/rs6000/rs6000.cc | 231 +++++++++++++++++++++++-----
> gcc/config/rs6000/rs6000.h | 40 ++++-
> gcc/config/rs6000/rs6000.md | 2 +
> gcc/config/rs6000/rs6000.opt | 4 +
> 9 files changed, 325 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 9f866361376..3f5852ca2bb 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -90,6 +90,7 @@ (define_c_enum "unspec"
> UNSPEC_MMA_XVI8GER4SPP
> UNSPEC_MMA_XXMFACC
> UNSPEC_MMA_XXMTACC
> + UNSPEC_MMA_DMSETDMRZ
> ])
>
> (define_c_enum "unspecv"
> @@ -313,7 +314,9 @@ (define_insn_and_split "*movoo"
> (set_attr "length" "*,*,8")])
>
>
> -;; Vector quad support. XOmode can only live in FPRs.
> +;; Vector quad support. Under the original MMA, XOmode can only live in VSX
> +;; registers 0..31. With dense math, XOmode can live in either VSX registers
> +;; (0..63) or DMF registers.
It should be (0..31), not (0..63).
> (define_expand "movxo"
> [(set (match_operand:XO 0 "nonimmediate_operand")
> (match_operand:XO 1 "input_operand"))]
> @@ -338,10 +341,10 @@ (define_expand "movxo"
> gcc_assert (false);
> })
>
> -(define_insn_and_split "*movxo"
> +(define_insn_and_split "*movxo_nodm"
> [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
> (match_operand:XO 1 "input_operand" "ZwO,d,d"))]
> - "TARGET_MMA
> + "TARGET_MMA_NO_DENSE_MATH
> && (gpc_reg_operand (operands[0], XOmode)
> || gpc_reg_operand (operands[1], XOmode))"
> "@
> @@ -358,6 +361,31 @@ (define_insn_and_split "*movxo"
> (set_attr "length" "*,*,16")
> (set_attr "max_prefixed_insns" "2,2,*")])
>
> +(define_insn_and_split "*movxo_dm"
> + [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa")
> + (match_operand:XO 1 "input_operand" "ZwO,wa, wa,wa,wD,wD"))]
> + "TARGET_DENSE_MATH
> + && (gpc_reg_operand (operands[0], XOmode)
> + || gpc_reg_operand (operands[1], XOmode))"
> + "@
> + #
> + #
> + #
> + dmxxinstdmr512 %0,%1,%Y1,0
> + dmmr %0,%1
> + dmxxextfdmr512 %0,%Y0,%1,0"
> + "&& reload_completed
> + && !dmf_operand (operands[0], XOmode)
> + && !dmf_operand (operands[1], XOmode)"
> + [(const_int 0)]
> +{
> + rs6000_split_multireg_move (operands[0], operands[1]);
> + DONE;
> +}
> + [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma")
> + (set_attr "length" "*,*,16,*,*,*")
> + (set_attr "max_prefixed_insns" "2,2,*,*,*,*")])
> +
> (define_expand "vsx_assemble_pair"
> [(match_operand:OO 0 "vsx_register_operand")
> (match_operand:V16QI 1 "mma_assemble_input_operand")
> @@ -456,29 +484,53 @@ (define_expand "mma_disassemble_acc"
> DONE;
> })
>
> -;; MMA instructions that do not use their accumulators as an input, still
> -;; must not allow their vector operands to overlap the registers used by
> -;; the accumulator. We enforce this by marking the output as early clobber.
> +;; MMA instructions that do not use their accumulators as an input, still
> must
> +;; not allow their vector operands to overlap the registers used by the
> +;; accumulator. We enforce this by marking the output as early clobber. The
> +;; prime and de-prime instructions are not needed on systems with dense math
> +;; registers.
>
> (define_insn "mma_<acc>"
> [(set (match_operand:XO 0 "accumulator_operand" "=&wD")
> - (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
> + (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
> MMA_ACC))]
> - "TARGET_MMA"
> + "TARGET_MMA_NO_DENSE_MATH"
> "<acc> %A0"
> [(set_attr "type" "mma")])
>
> ;; We can't have integer constants in XOmode so we wrap this in an
> -;; UNSPEC_VOLATILE.
> +;; UNSPEC_VOLATILE. If we have dense math registers, we can just use a
> normal
> +;; UNSPEC instead of UNSPEC_VOLATILE.
>
> -(define_insn "mma_xxsetaccz"
> - [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +(define_expand "mma_xxsetaccz"
> + [(set (match_operand:XO 0 "accumulator_operand")
> (unspec_volatile:XO [(const_int 0)]
> UNSPECV_MMA_XXSETACCZ))]
> "TARGET_MMA"
> +{
> + if (TARGET_DENSE_MATH)
> + {
> + emit_insn (gen_mma_dmsetdmrz (operands[0]));
> + DONE;
> + }
> +})
> +
> +(define_insn "*mma_xxsetaccz"
> + [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> + (unspec_volatile:XO [(const_int 0)]
> + UNSPECV_MMA_XXSETACCZ))]
> + "TARGET_MMA_NO_DENSE_MATH"
> "xxsetaccz %A0"
> [(set_attr "type" "mma")])
>
> +(define_insn "mma_dmsetdmrz"
> + [(set (match_operand:XO 0 "accumulator_operand" "=wD")
> + (unspec [(const_int 0)]
> + UNSPEC_MMA_DMSETDMRZ))]
> + "TARGET_DENSE_MATH"
> + "dmsetdmrz %A0"
> + [(set_attr "type" "mma")])
> +
> (define_insn "mma_<vv>"
> [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
> (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 9f152037222..f1e03ec30c9 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -186,8 +186,23 @@ (define_predicate "vlogical_operand"
> return VLOGICAL_REGNO_P (REGNO (op));
> })
>
> +;; Return 1 if op is a DMF register
> +(define_predicate "dmf_operand"
> + (match_operand 0 "register_operand")
> +{
> + if (!REG_P (op))
> + return 0;
> +
> + if (!HARD_REGISTER_P (op))
> + return 1;
> +
> + return DMF_REGNO_P (REGNO (op));
> +})
> +
> ;; Return 1 if op is an accumulator. On power10 systems, the accumulators
> -;; overlap with the FPRs.
> +;; overlap with the FPRs, while on systems with dense math, the accumulators
> +;; are separate dense math registers and do not overlap with the FPR
> +;; registers..
> (define_predicate "accumulator_operand"
> (match_operand 0 "register_operand")
> {
> @@ -198,7 +213,9 @@ (define_predicate "accumulator_operand"
> return 1;
>
> int r = REGNO (op);
> - return FP_REGNO_P (r) && (r & 3) == 0;
> + return (TARGET_DENSE_MATH
> + ? DMF_REGNO_P (r)
> + : FP_REGNO_P (r) && (r & 3) == 0);
> })
>
> ;; Return 1 if op is the carry register.
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index bc1580f051b..6b7e5686f0c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1125,8 +1125,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator
> *gsi,
> }
>
> /* If we're disassembling an accumulator into a different type, we need
> - to emit a xxmfacc instruction now, since we cannot do it later. */
> - if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
> + to emit a xxmfacc instruction now, since we cannot do it later. If we
> + have dense math registers, we don't need to do this. */
> + if (fncode == RS6000_BIF_DISASSEMBLE_ACC && !TARGET_DENSE_MATH)
> {
> new_decl = rs6000_builtin_decls[RS6000_BIF_XXMFACC_INTERNAL];
> new_call = gimple_build_call (new_decl, 1, src);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 6757a2477ad..e202fd6c7df 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -587,9 +587,14 @@ rs6000_target_modify_macros (bool define_p,
> HOST_WIDE_INT flags)
> if (rs6000_cpu == PROCESSOR_CELL)
> rs6000_define_or_undefine_macro (define_p, "__PPU__");
>
> - /* Tell the user if we support the MMA instructions. */
> + /* Tell the user if we support the MMA instructions. Also tell them if MMA
> + uses the dense math registers. */
> if ((flags & OPTION_MASK_MMA) != 0)
> - rs6000_define_or_undefine_macro (define_p, "__MMA__");
> + {
> + rs6000_define_or_undefine_macro (define_p, "__MMA__");
> + if ((flags & OPTION_MASK_DENSE_MATH) != 0)
> + rs6000_define_or_undefine_macro (define_p, "__DENSE_MATH__");
> + }
> /* Whether pc-relative code is being generated. */
> if ((flags & OPTION_MASK_PCREL) != 0)
> rs6000_define_or_undefine_macro (define_p, "__PCREL__");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def
> b/gcc/config/rs6000/rs6000-cpus.def
> index a0e6745495d..c03b069b779 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -91,6 +91,7 @@
> will be fixed in potential future machines. */
> #define FUTURE_MASKS_SERVER (POWER11_MASKS_SERVER \
> | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \
> + | OPTION_MASK_DENSE_MATH \
> | OPTION_MASK_FUTURE)
>
> /* Flags that need to be turned off if -mno-vsx. */
> @@ -124,6 +125,7 @@
> | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR \
> | OPTION_MASK_CMPB \
> | OPTION_MASK_CRYPTO \
> + | OPTION_MASK_DENSE_MATH \
> | OPTION_MASK_DFP \
> | OPTION_MASK_DLMZB \
> | OPTION_MASK_EFFICIENT_UNALIGNED_VSX \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index ac95ea05657..570e8a14f2d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -292,7 +292,8 @@ enum rs6000_reg_type {
> ALTIVEC_REG_TYPE,
> FPR_REG_TYPE,
> SPR_REG_TYPE,
> - CR_REG_TYPE
> + CR_REG_TYPE,
> + DMF_REG_TYPE
s/DMF/DMR
> };
>
> /* Map register class to register type. */
> @@ -306,22 +307,23 @@ static enum rs6000_reg_type
> reg_class_to_reg_type[N_REG_CLASSES];
>
>
> /* Register classes we care about in secondary reload or go if legitimate
> - address. We only need to worry about GPR, FPR, and Altivec registers
> here,
> - along an ANY field that is the OR of the 3 register classes. */
> + address. We only need to worry about GPR, FPR, Altivec, and DMF registers
> + here, along an ANY field that is the OR of the 4 register classes. */
>
> enum rs6000_reload_reg_type {
> RELOAD_REG_GPR, /* General purpose registers. */
> RELOAD_REG_FPR, /* Traditional floating point regs. */
> RELOAD_REG_VMX, /* Altivec (VMX) registers. */
> - RELOAD_REG_ANY, /* OR of GPR, FPR, Altivec masks. */
> + RELOAD_REG_DMF, /* DMF registers. */
s/RELOAD_REG_DMF/RELOAD_REG_DMR
> + RELOAD_REG_ANY, /* OR of GPR/FPR/VMX/DMF masks. */
s/DMF/DMR
> N_RELOAD_REG
> };
>
> -/* For setting up register classes, loop through the 3 register classes
> mapping
> +/* For setting up register classes, loop through the 4 register classes
> mapping
> into real registers, and skip the ANY class, which is just an OR of the
> bits. */
> #define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR
> -#define LAST_RELOAD_REG_CLASS RELOAD_REG_VMX
> +#define LAST_RELOAD_REG_CLASS RELOAD_REG_DMF
>
> /* Map reload register type to a register in the register class. */
> struct reload_reg_map_type {
> @@ -333,6 +335,7 @@ static const struct reload_reg_map_type
> reload_reg_map[N_RELOAD_REG] = {
> { "Gpr", FIRST_GPR_REGNO }, /* RELOAD_REG_GPR. */
> { "Fpr", FIRST_FPR_REGNO }, /* RELOAD_REG_FPR. */
> { "VMX", FIRST_ALTIVEC_REGNO }, /* RELOAD_REG_VMX. */
> + { "DMF", FIRST_DMF_REGNO }, /* RELOAD_REG_DMF. */
s/DMF/DMR
> { "Any", -1 }, /* RELOAD_REG_ANY. */
> };
>
> @@ -1226,6 +1229,8 @@ char rs6000_reg_names[][8] =
> "0", "1", "2", "3", "4", "5", "6", "7",
> /* vrsave vscr sfp */
> "vrsave", "vscr", "sfp",
> + /* DMFs */
s/DMF/DMR
> + "0", "1", "2", "3", "4", "5", "6", "7",
> };
>
> #ifdef TARGET_REGNAMES
> @@ -1252,6 +1257,8 @@ static const char alt_reg_names[][8] =
> "%cr0", "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7",
> /* vrsave vscr sfp */
> "vrsave", "vscr", "sfp",
> + /* DMFs */
s/DMF/DMR
> + "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7",
> };
> #endif
>
> @@ -1842,6 +1849,9 @@ `` (int regno, machine_mode mode)
> else if (ALTIVEC_REGNO_P (regno))
> reg_size = UNITS_PER_ALTIVEC_WORD;
>
> + else if (DMF_REGNO_P (regno))
> + reg_size = UNITS_PER_DMF_WORD;
> +
> else
> reg_size = UNITS_PER_WORD;
>
> @@ -1863,9 +1873,35 @@ rs6000_hard_regno_mode_ok_uncached (int regno,
> machine_mode mode)
> if (mode == OOmode)
> return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
>
> - /* MMA accumulator modes need FPR registers divisible by 4. */
> + /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible
> + by 4.
> +
> + If dense math registers are enabled, we can allow all VSX registers plus
> + the DMF registers. VSX registers are used to load and store the
> registers
> + as the accumulator registers do not have load and store instructions.
> + Because we just use the VSX registers for load/store operations, we just
> + need to make sure load vector pair and store vector pair instructions
> can
> + be used. */`
> if (mode == XOmode)
> - return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);
> + {
> + if (!TARGET_MMA)
> + return 0;
We can be using XOmode even if TARGET_MMA is false.
> +
> + else if (!TARGET_DENSE_MATH)
> + return (FP_REGNO_P (regno) && (regno & 3) == 0);
> +
> + else if (DMF_REGNO_P (regno))
> + return 1;
> +
> + else
> + return (VSX_REGNO_P (regno)
> + && VSX_REGNO_P (last_regno)
> + && (regno & 1) == 0);
> + }
> +
> + /* No other types other than XOmode can go in DMFs. */
> + if (DMF_REGNO_P (regno))
> + return 0;
>
> /* PTImode can only go in GPRs. Quad word memory operations require
> even/odd
> register combinations, and use PTImode where we need to deal with quad
> @@ -2308,6 +2344,7 @@ rs6000_debug_reg_global (void)
> rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO,
> LAST_ALTIVEC_REGNO,
> "vs");
> + rs6000_debug_reg_print (FIRST_DMF_REGNO, LAST_DMF_REGNO, "dmf");
> rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr");
> rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr");
> rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr");
> @@ -2634,6 +2671,21 @@ rs6000_setup_reg_addr_masks (void)
> addr_mask = 0;
> reg = reload_reg_map[rc].reg;
>
> + /* Special case DMF registers. */
> + if (rc == RELOAD_REG_DMF)
> + {
> + if (TARGET_DENSE_MATH && m2 == XOmode)
> + {
> + addr_mask = RELOAD_REG_VALID;
> + reg_addr[m].addr_mask[rc] = addr_mask;
> + any_addr_mask |= addr_mask;
> + }
> + else
> + reg_addr[m].addr_mask[rc] = 0;
> +
> + continue;
> + }
> +
> /* Can mode values go in the GPR/FPR/Altivec registers? */
> if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
> {
> @@ -2784,6 +2836,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
> for (r = CR1_REGNO; r <= CR7_REGNO; ++r)
> rs6000_regno_regclass[r] = CR_REGS;
>
> + for (r = FIRST_DMF_REGNO; r <= LAST_DMF_REGNO; ++r)
> + rs6000_regno_regclass[r] = DM_REGS;
> +
> rs6000_regno_regclass[LR_REGNO] = LINK_REGS;
> rs6000_regno_regclass[CTR_REGNO] = CTR_REGS;
> rs6000_regno_regclass[CA_REGNO] = NO_REGS;
> @@ -2808,6 +2863,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
> reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
> reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
> reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
> + reg_class_to_reg_type[(int)DM_REGS] = DMF_REG_TYPE;
>
> if (TARGET_VSX)
> {
> @@ -2994,8 +3050,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
> if (TARGET_DIRECT_MOVE_128)
> rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
>
> + /* Support for the accumulator registers, either FPR registers (aka
> original
> + mma) or DMF registers (dense math). */
> if (TARGET_MMA)
> - rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
> + rs6000_constraints[RS6000_CONSTRAINT_wD]
> + = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS;
>
> /* Set up the reload helper and direct move functions. */
> if (TARGET_VSX || TARGET_ALTIVEC)
> @@ -4410,6 +4469,16 @@ rs6000_option_override_internal (bool global_init_p)
> if (!TARGET_PCREL && TARGET_PCREL_OPT)
> rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
>
> + /* Turn off dense math MMA+ options on non-future systems. */
> + if (TARGET_DENSE_MATH && (!TARGET_MMA || !TARGET_FUTURE))
> + {
> + if ((rs6000_isa_flags_explicit & OPTION_MASK_DENSE_MATH) != 0)
> + error ("%qs requires %qs", "-mdense-math",
> + (!TARGET_FUTURE ? "-mcpu=future" : "-mma"));
> +
> + rs6000_isa_flags &= ~OPTION_MASK_DENSE_MATH;
> + }
> +
> if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
> rs6000_print_isa_options (stderr, 0, "after subtarget",
> rs6000_isa_flags);
>
> @@ -12356,6 +12425,11 @@ rs6000_secondary_reload_memory (rtx addr,
> addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX]
> & ~RELOAD_REG_AND_M16);
>
> + /* DMF registers use VSX registers for memory operations, and need to
> + generate some extra instructions. */
> + else if (rclass == DM_REGS)
> + return 2;
> +
> /* If the register allocator hasn't made up its mind yet on the register
> class to use, settle on defaults to use. */
> else if (rclass == NO_REGS)
> @@ -12684,6 +12758,13 @@ rs6000_secondary_reload_simple_move (enum
> rs6000_reg_type to_type,
> || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
> return true;
>
> + /* We can transfer between VSX registers and DMF registers without needing
> + extra registers. */
> + if (TARGET_DENSE_MATH && mode == XOmode
> + && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE)
> + || (to_type == VSX_REG_TYPE && from_type == DMF_REG_TYPE)))
> + return true;
> +
> return false;
> }
>
> @@ -13378,6 +13459,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class
> rclass)
> machine_mode mode = GET_MODE (x);
> bool is_constant = CONSTANT_P (x);
>
> + /* DMF registers can't be loaded or stored. */
> + if (rclass == DM_REGS)
> + return NO_REGS;
> +
> /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a
> preferred
> reload class for it. */
> if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS)
> @@ -13474,7 +13559,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class
> rclass)
> return VSX_REGS;
>
> if (mode == XOmode)
> - return FLOAT_REGS;
> + return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
>
> if (GET_MODE_CLASS (mode) == MODE_INT)
> return GENERAL_REGS;
> @@ -13599,6 +13684,11 @@ rs6000_secondary_reload_class (enum reg_class
> rclass, machine_mode mode,
> else
> regno = -1;
>
> + /* DMF registers don't have loads or stores. We have to go through the VSX
> + registers to load XOmode (vector quad). */
> + if (TARGET_DENSE_MATH && rclass == DM_REGS)
> + return VSX_REGS;
> +
> /* If we have VSX register moves, prefer moving scalar values between
> Altivec registers and GPR by going via an FPR (and then via memory)
> instead of reloading the secondary memory address for Altivec moves. */
> @@ -14130,8 +14220,19 @@ print_operand (FILE *file, rtx x, int code)
> output_operand. */
>
> case 'A':
> - /* Write the MMA accumulator number associated with VSX register X. */
> - if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
> + /* Write the MMA accumulator number associated with VSX register X. On
> + dense math systems, only allow DMF accumulators, not accumulators
> + overlapping with the FPR registers. */
> + if (!REG_P (x))
> + output_operand_lossage ("invalid %%A value");
> + else if (TARGET_DENSE_MATH)
> + {
> + if (DMF_REGNO_P (REGNO (x)))
> + fprintf (file, "%d", REGNO (x) - FIRST_DMF_REGNO);
> + else
> + output_operand_lossage ("%%A operand is not a DMF");
> + }
> + else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
> output_operand_lossage ("invalid %%A value");
> else
> fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
> @@ -22751,6 +22852,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode,
> }
>
>
> +/* Subroutine to determine the move cost of dense math registers. If we are
> + moving to/from VSX_REGISTER registers, the cost is either 1 move (for
> + 512-bit accumulators) or 2 moves (for 1,024 dmf registers). If we are
> + moving to anything else like GPR registers, make the cost very high. */
> +
> +static int
> +rs6000_dmf_register_move_cost (machine_mode mode, reg_class_t rclass)
> +{
> + const int reg_move_base = 2;
> + HARD_REG_SET vsx_set = (reg_class_contents[rclass]
> + & reg_class_contents[VSX_REGS]);
> +
> + if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set))
> + {
> + /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction. */
> + if (mode == XOmode)
> + return reg_move_base;
> +
> + else
> + return reg_move_base * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> + }
> +
> + return 1000 * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> +}
> +
> /* A C expression returning the cost of moving data from a register of class
> CLASS1 to one of CLASS2. */
>
> @@ -22764,17 +22890,28 @@ rs6000_register_move_cost (machine_mode mode,
> if (TARGET_DEBUG_COST)
> dbg_cost_ctrl++;
>
> + HARD_REG_SET to_vsx, from_vsx;
> + to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> + from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> +
> + /* Special case DMF registers, that can only move to/from VSX registers.
> */
> + if (from == DM_REGS && to == DM_REGS)
> + ret = 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> +
> + else if (from == DM_REGS)
> + ret = rs6000_dmf_register_move_cost (mode, to);
> +
> + else if (to == DM_REGS)
> + ret = rs6000_dmf_register_move_cost (mode, from);
> +
> /* If we have VSX, we can easily move between FPR or Altivec registers,
> otherwise we can only easily move within classes.
> Do this first so we give best-case answers for union classes
> containing both gprs and vsx regs. */
> - HARD_REG_SET to_vsx, from_vsx;
> - to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> - from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> - if (!hard_reg_set_empty_p (to_vsx)
> - && !hard_reg_set_empty_p (from_vsx)
> - && (TARGET_VSX
> - || hard_reg_set_intersect_p (to_vsx, from_vsx)))
> + else if (!hard_reg_set_empty_p (to_vsx)
> + && !hard_reg_set_empty_p (from_vsx)
> + && (TARGET_VSX
> + || hard_reg_set_intersect_p (to_vsx, from_vsx)))
> {
> int reg = FIRST_FPR_REGNO;
> if (TARGET_VSX
> @@ -22870,6 +23007,9 @@ rs6000_memory_move_cost (machine_mode mode,
> reg_class_t rclass,
> ret = 4 * hard_regno_nregs (32, mode);
> else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
> ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode);
> + else if (reg_classes_intersect_p (rclass, DM_REGS))
> + ret = (rs6000_dmf_register_move_cost (mode, VSX_REGS)
> + + rs6000_memory_move_cost (mode, VSX_REGS, false));
> else
> ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS);
>
> @@ -24078,6 +24218,8 @@ rs6000_compute_pressure_classes (enum reg_class
> *pressure_classes)
> if (TARGET_HARD_FLOAT)
> pressure_classes[n++] = FLOAT_REGS;
> }
> + if (TARGET_DENSE_MATH)
> + pressure_classes[n++] = DM_REGS;
> pressure_classes[n++] = CR_REGS;
> pressure_classes[n++] = SPECIAL_REGS;
>
> @@ -24242,6 +24384,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned
> int format)
> return 67;
> if (regno == 64)
> return 64;
> + /* XXX: This is a guess. The GCC register number for FIRST_DMF_REGNO is
> 111,
> + but the frame pointer regnum uses that. */
> + if (DMF_REGNO_P (regno))
> + return regno - FIRST_DMF_REGNO + 112;
>
> gcc_unreachable ();
> }
> @@ -24463,6 +24609,7 @@ static struct rs6000_opt_mask const
> rs6000_opt_masks[] =
> false, true },
> { "cmpb", OPTION_MASK_CMPB, false, true },
> { "crypto", OPTION_MASK_CRYPTO, false,
> true },
> + { "dense-math", OPTION_MASK_DENSE_MATH, false, true },
> { "direct-move", 0, false, true },
> { "dlmzb", OPTION_MASK_DLMZB, false, true },
> { "efficient-unaligned-vsx", OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
> @@ -27480,9 +27627,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> unsigned offset = 0;
> unsigned size = GET_MODE_SIZE (reg_mode);
>
> - /* If we are reading an accumulator register, we have to
> - deprime it before we can access it. */
> - if (TARGET_MMA
> + /* If we are reading an accumulator register, we have to deprime it
> + before we can access it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH
> && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
> emit_insn (gen_mma_xxmfacc (src, src));
>
> @@ -27514,9 +27661,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> emit_insn (gen_rtx_SET (dst2, src2));
> }
>
> - /* If we are writing an accumulator register, we have to
> - prime it after we've written it. */
> - if (TARGET_MMA
> + /* If we are writing an accumulator register, we have to prime it
> + after we've written it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH
> && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
> emit_insn (gen_mma_xxmtacc (dst, dst));
>
> @@ -27530,7 +27677,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE);
> gcc_assert (REG_P (dst));
> if (GET_MODE (src) == XOmode)
> - gcc_assert (FP_REGNO_P (REGNO (dst)));
> + gcc_assert ((TARGET_DENSE_MATH
> + ? VSX_REGNO_P (REGNO (dst))
> + : FP_REGNO_P (REGNO (dst))));
> if (GET_MODE (src) == OOmode)
> gcc_assert (VSX_REGNO_P (REGNO (dst)));
>
> @@ -27583,9 +27732,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> emit_insn (gen_rtx_SET (dst_i, op));
> }
>
> - /* We are writing an accumulator register, so we have to
> - prime it after we've written it. */
> - if (GET_MODE (src) == XOmode)
> + /* We are writing an accumulator register, so we have to prime it
> + after we've written it unless we have dense math registers. */
> + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
> emit_insn (gen_mma_xxmtacc (dst, dst));
>
> return;
> @@ -27596,9 +27745,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>
> if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
> {
> - /* If we are reading an accumulator register, we have to
> - deprime it before we can access it. */
> - if (TARGET_MMA
> + /* If we are reading an accumulator register, we have to deprime it
> + before we can access it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH
> && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
> emit_insn (gen_mma_xxmfacc (src, src));
>
> @@ -27624,9 +27773,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> i * reg_mode_size)));
> }
>
> - /* If we are writing an accumulator register, we have to
> - prime it after we've written it. */
> - if (TARGET_MMA
> + /* If we are writing an accumulator register, we have to prime it after
> + we've written it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH
> && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
> emit_insn (gen_mma_xxmtacc (dst, dst));
> }
> @@ -27761,9 +27910,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
> }
>
> - /* If we are reading an accumulator register, we have to
> - deprime it before we can access it. */
> - if (TARGET_MMA && REG_P (src)
> + /* If we are reading an accumulator register, we have to deprime it
> + before we can access it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH && REG_P (src)
> && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
> emit_insn (gen_mma_xxmfacc (src, src));
>
> @@ -27793,9 +27942,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
> j * reg_mode_size)));
> }
>
> - /* If we are writing an accumulator register, we have to
> - prime it after we've written it. */
> - if (TARGET_MMA && REG_P (dst)
> + /* If we are writing an accumulator register, we have to prime it after
> + we've written it unless we have dense math registers. */
> + if (TARGET_MMA_NO_DENSE_MATH && REG_P (dst)
> && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
> emit_insn (gen_mma_xxmtacc (dst, dst));
>
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index d1f953630f7..169d81e208e 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -556,6 +556,9 @@ extern int rs6000_vector_align[];
> #define TARGET_DIRECT_MOVE_64BIT (TARGET_DIRECT_MOVE \
> && TARGET_POWERPC64)
>
> +/* Whether we have MMA support without dense math support. */
> +#define TARGET_MMA_NO_DENSE_MATH (TARGET_MMA && !TARGET_DENSE_MATH)
> +
> /* Inlining allows targets to define the meanings of bits in target_info
> field of ipa_fn_summary by itself, the used bits for rs6000 are listed
> below. */
> @@ -653,6 +656,7 @@ extern unsigned char rs6000_recip_bits[];
> #define UNITS_PER_FP_WORD 8
> #define UNITS_PER_ALTIVEC_WORD 16
> #define UNITS_PER_VSX_WORD 16
> +#define UNITS_PER_DMF_WORD 128
>
> /* Type used for ptrdiff_t, as a string used in a declaration. */
> #define PTRDIFF_TYPE "int"
> @@ -766,7 +770,7 @@ enum data_align { align_abi, align_opt, align_both };
> Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame
> pointer, which is eventually eliminated in favor of SP or FP. */
>
> -#define FIRST_PSEUDO_REGISTER 111
> +#define FIRST_PSEUDO_REGISTER 119
>
> /* Use standard DWARF numbering for DWARF debugging information. */
> #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0)
> @@ -803,7 +807,9 @@ enum data_align { align_abi, align_opt, align_both };
> /* cr0..cr7 */ \
> 0, 0, 0, 0, 0, 0, 0, 0, \
> /* vrsave vscr sfp */ \
> - 1, 1, 1 \
> + 1, 1, 1, \
> + /* DMF registers. */ \
> + 0, 0, 0, 0, 0, 0, 0, 0 \
> }
>
> /* Like `CALL_USED_REGISTERS' except this macro doesn't require that
> @@ -827,7 +833,9 @@ enum data_align { align_abi, align_opt, align_both };
> /* cr0..cr7 */ \
> 1, 1, 0, 0, 0, 1, 1, 1, \
> /* vrsave vscr sfp */ \
> - 0, 0, 0 \
> + 0, 0, 0, \
> + /* DMF registers. */ \
> + 0, 0, 0, 0, 0, 0, 0, 0 \
> }
>
> #define TOTAL_ALTIVEC_REGS (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1)
> @@ -864,6 +872,7 @@ enum data_align { align_abi, align_opt, align_both };
> v2 (not saved; incoming vector arg reg; return value)
> v19 - v14 (not saved or used for anything)
> v31 - v20 (saved; order given to save least number)
> + dmr0 - dmr7 (not saved)
> vrsave, vscr (fixed)
> sfp (fixed)
> */
> @@ -906,6 +915,9 @@ enum data_align { align_abi, align_opt, align_both };
> 66, \
> 83, 82, 81, 80, 79, 78, \
> 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, \
> + /* DMF registers. */ \
> + 111, 112, 113, 114, 115, 116, 117, 118, \
> + /* Vrsave, vscr, sfp. */ \
> 108, 109, \
> 110 \
> }
> @@ -932,6 +944,9 @@ enum data_align { align_abi, align_opt, align_both };
> /* True if register is a VSX register. */
> #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N))
>
> +/* True if register is a DMF register. */
> +#define DMF_REGNO_P(N) ((N) >= FIRST_DMF_REGNO && (N) <= LAST_DMF_REGNO)
> +
> /* Alternate name for any vector register supporting floating point, no
> matter
> which instruction set(s) are available. */
> #define VFLOAT_REGNO_P(N) \
> @@ -1069,6 +1084,7 @@ enum reg_class
> FLOAT_REGS,
> ALTIVEC_REGS,
> VSX_REGS,
> + DM_REGS,
> VRSAVE_REGS,
> VSCR_REGS,
> GEN_OR_FLOAT_REGS,
> @@ -1098,6 +1114,7 @@ enum reg_class
> "FLOAT_REGS",
> \
> "ALTIVEC_REGS", \
> "VSX_REGS",
> \
> + "DM_REGS", \
> "VRSAVE_REGS", \
> "VSCR_REGS",
> \
> "GEN_OR_FLOAT_REGS",
> \
> @@ -1132,6 +1149,8 @@ enum reg_class
> { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 },
> \
> /* VSX_REGS. */ \
> { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 },
> \
> + /* DM_REGS. */ \
> + { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 },
> \
> /* VRSAVE_REGS. */
> \
> { 0x00000000, 0x00000000, 0x00000000, 0x00001000 },
> \
> /* VSCR_REGS. */ \
> @@ -1159,7 +1178,7 @@ enum reg_class
> /* CA_REGS. */ \
> { 0x00000000, 0x00000000, 0x00000000, 0x00000004 },
> \
> /* ALL_REGS. */ \
> - { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff } \
> + { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff } \
> }
>
> /* The same information, inverted:
> @@ -2060,7 +2079,16 @@ extern char rs6000_reg_names[][8]; /* register
> names (0 vs. %r0). */
> &rs6000_reg_names[108][0], /* vrsave */ \
> &rs6000_reg_names[109][0], /* vscr */ \
> \
> - &rs6000_reg_names[110][0] /* sfp */ \
> + &rs6000_reg_names[110][0], /* sfp */ \
> + \
> + &rs6000_reg_names[111][0], /* dmr0 */ \
> + &rs6000_reg_names[112][0], /* dmr1 */ \
> + &rs6000_reg_names[113][0], /* dmr2 */ \
> + &rs6000_reg_names[114][0], /* dmr3 */ \
> + &rs6000_reg_names[115][0], /* dmr4 */ \
> + &rs6000_reg_names[116][0], /* dmr5 */ \
> + &rs6000_reg_names[117][0], /* dmr6 */ \
> + &rs6000_reg_names[118][0], /* dmr7 */ \
> }
>
> /* Table of additional register names to use in user input. */
> @@ -2114,6 +2142,8 @@ extern char rs6000_reg_names[][8]; /* register
> names (0 vs. %r0). */
> {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87}, \
> {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91}, \
> {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95}, \
> + {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114}, \
> + {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118}, \
> }
>
> /* This is how to output an element of a case-vector that is relative. */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ff085bf9bb1..0717e86e9d6 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -51,6 +51,8 @@ (define_constants
> (VRSAVE_REGNO 108)
> (VSCR_REGNO 109)
> (FRAME_POINTER_REGNUM 110)
> + (FIRST_DMF_REGNO 111)
> + (LAST_DMF_REGNO 118)
> ])
>
> ;;
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 7c4f0375424..72578644037 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,6 +639,10 @@ mfuture
> Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags) Warn(Do not use
> %<-mfuture>, use %<-mcpu=future>)
> Generate (do not generate) potential future instructions.
>
> +mdense_math
> +Target Mask(DENSE_MATH) Var(rs6000_isa_flags)
> +Generate (do not generate) dense math MMA+ instructions.
The Dense Math registers can be used in MMA+ as well as non-MMA+
instructions. So we need to reword the above line. The -mdense-match
flag should enable/disable use of dense math registers.
-Surya
> +
> ; Documented parameters
>
> -param=rs6000-vect-unroll-limit=