Re: [PATCH V2, 2/3] Add support for dense math registers on a future PowerPC

Surya Kumari Jangala Tue, 27 Jan 2026 05:31:25 -0800

Hi Mike,

Overall, I have the following comments regarding the patch:


1. MMA and Dense Math have to be decoupled. The future processor
can use dense math registers for non-MMA instructions too.

2. Use DMR wherever we want to reference a register.

3. Decide what we want TARGET_MMA to refer to. Will it also refer to
the new MMA instructions that may be added to the future processor?


On 14/11/25 1:25 pm, Michael Meissner wrote:
> The MMA subsystem added the notion of accumulator registers as an optional
> feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped with
> the VSX registers 0..31, but logically the accumulator registers were separate
> from the FPR registers.  In ISA 3.1, it was anticipated that in future 
> systems,
> the accumulator registers may no overlap with the FPR registers.  This patch
> adds the support for dense math registers as separate registers.
> 
> This particular patch does not change the MMA support to use the accumulators
> within the dense math registers.  This patch just adds the basic support for
> having separate DMRs.  The next patch will switch the MMA support to use the
> accumulators if -mcpu=future is used.
> 
> For testing purposes, I added an undocumented option '-mdense-math' to enable
> or disable the dense math support.
> 
> This patch updates the wD constraint added in the previous patch.  If MMA is
> selected but dense math is not selected (i.e. -mcpu=power10), the wD 
> constraint
> will allow access to accumulators that overlap with VSX registers 0..31.  If
> both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint
> will only allow dense math registers.

The future processor can use dense math registers for certain non-MMA 
operations.
So the wD constraint should allow dense math registers even if MMA is not 
selected.
The behaviour of the wD constraint should be solely determined by the 
-mdense-math option.

> 
> This patch modifies the existing %A output modifier.  If MMA is selected but
> dense math is not selected, then %A output modifier converts the VSX register
> number to the accumulator number, by dividing it by 4.  If both MMA and dense
> math are selected, then %A will map the separate DMF registers into 0..7.

Similarly, here too, the behaviour of %A output modifier should solely depend
on whether dense math is selected or not. For the future power processor, the
behaviour of %A should not depend on MMA.

And this begets the question: What exactly will TARGET_MMA mean for 
-mcpu=future?
Will it mean only the MMA facility present in power10? Or will it also mean the
new MMA facility that may be present in a future processor?

> 
> The intention is that user code using extended asm can be modified to run on
> both MMA without dense math and MMA with dense math:
> 
>     1)        If possible, don't use extended asm, but instead use the MMA 
> built-in
>       functions;
> 
>     2)        If you do need to write extended asm, change the d constraints
>       targetting accumulators should now use wD;
> 
>     3)        Only use the built-in zero, assemble and disassemble functions 
> create
>       move data between vector quad types and dense math accumulators.

The above line ("...create move data...") is not clear. It needs rewriting.

>       I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
>       extended asm code. 


 The reason is these instructions assume there is a
>       1-to-1 correspondence between 4 adjacent FPR registers and an
>       accumulator that overlaps with those instructions.  With accumulators
>       now being separate registers, there no longer is a 1-to-1
>       correspondence.
> 
> It is possible that the mangling for DMFs and the GDB register numbers may
> produce other changes in the future.
> 
> I have built bootstrap GCC compilers on little endian and big endian
> PowerPC servers, and there were no regressions.  Can I commit this
> patch to GCC 16 once the following patches have been applied?
> 
>   * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700539.html
>   * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700540.html
>   * https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700542.html
> 
> gcc/
> 
> 2025-11-13   Michael Meissner  <[email protected]>
> 
>       * config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec.
>       (movxo): Add comments about dense math registers.
>       (movxo_nodm): Rename from movxo and restrict the usage to machines
>       without dense math registers.
>       (movxo_dm): New insn for movxo support for machines with dense math
>       registers.
>       (mma_<acc>): Restrict usage to machines without dense math registers.
>       (mma_xxsetaccz): Add a define_expand wrapper, and add support for dense
>       math registers.
>       (mma_dmsetaccz): New insn.
>       * config/rs6000/predicates.md (dmf_operand): New predicate.
>       (accumulator_operand): Add support for dense math registers.
>       * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
>       not issue a de-prime instruction when disassembling a vector quad on a
>       system with dense math registers.
>       * config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): Define
>       __DENSE_MATH__ if we have dense math registers.
>       * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): Add -mdense-math.
>       (POWERPC_MASKS): Likewise.
>       * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMF_REG_TYPE.
>       (enum rs6000_reload_reg_type): Add RELOAD_REG_DMF.
>       (LAST_RELOAD_REG_CLASS): Add support for DMF registers and the wD
>       constraint.
>       (reload_reg_map): Likewise.
>       (rs6000_reg_names): Likewise.
>       (alt_reg_names): Likewise.
>       (rs6000_hard_regno_nregs_internal): Likewise.
>       (rs6000_hard_regno_mode_ok_uncached): Likewise.
>       (rs6000_debug_reg_global): Likewise.
>       (rs6000_setup_reg_addr_masks): Likewise.
>       (rs6000_init_hard_regno_mode_ok): Likewise.
>       (rs6000_option_override_internal): If -mdense-math, issue an error if
>       -mno-mma or not -mcpu=future.
>       (rs6000_secondary_reload_memory): Add support for DMF registers.
>       (rs6000_secondary_reload_simple_move): Likewise.
>       (rs6000_preferred_reload_class): Likewise.
>       (rs6000_secondary_reload_class): Likewise.
>       (print_operand): Make %A handle both FPRs and DMRs.
>       (rs6000_dmf_register_move_cost): New helper function.
>       (rs6000_register_move_cost): Add support for DMR registers.
>       (rs6000_memory_move_cost): Likewise.
>       (rs6000_compute_pressure_classes): Likewise.
>       (rs6000_debugger_regno): Likewise.
>       (rs6000_opt_masks): Add -mdense-math support.
>       (rs6000_split_multireg_move): Add support for DMRs.
>       * config/rs6000/rs6000.h (TARGET_MMA_NO_DENSE_MATH): New macro.
>       (UNITS_PER_DMF_WORD): Likewise.
>       (FIRST_PSEUDO_REGISTER): Update for DMRs.
>       (FIXED_REGISTERS): Add DMRs.
>       (CALL_REALLY_USED_REGISTERS): Likewise.
>       (REG_ALLOC_ORDER): Likewise.
>       (DMF_REGNO_P): New macro.
>       (enum reg_class): Add DM_REGS.
>       (REG_CLASS_NAMES): Likewise.
>       (REG_CLASS_CONTENTS): Likewise.
>       (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
>       (REGISTER_NAMES): Add DMF registers.
>       (ADDITIONAL_REGISTER_NAMES): Likewise.
>       * config/rs6000/rs6000.md (FIRST_DMF_REGNO): New constant.
>       (LAST_DMF_REGNO): Likewise.
>       * config/rs6000/rs6000.opt (-mdense-math): New option.
> ---
>  gcc/config/rs6000/mma.md            |  74 +++++++--
>  gcc/config/rs6000/predicates.md     |  21 ++-
>  gcc/config/rs6000/rs6000-builtin.cc |   5 +-
>  gcc/config/rs6000/rs6000-c.cc       |   9 +-
>  gcc/config/rs6000/rs6000-cpus.def   |   2 +
>  gcc/config/rs6000/rs6000.cc         | 231 +++++++++++++++++++++++-----
>  gcc/config/rs6000/rs6000.h          |  40 ++++-
>  gcc/config/rs6000/rs6000.md         |   2 +
>  gcc/config/rs6000/rs6000.opt        |   4 +
>  9 files changed, 325 insertions(+), 63 deletions(-)
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 9f866361376..3f5852ca2bb 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -90,6 +90,7 @@ (define_c_enum "unspec"
>     UNSPEC_MMA_XVI8GER4SPP
>     UNSPEC_MMA_XXMFACC
>     UNSPEC_MMA_XXMTACC
> +   UNSPEC_MMA_DMSETDMRZ
>    ])
>  
>  (define_c_enum "unspecv"
> @@ -313,7 +314,9 @@ (define_insn_and_split "*movoo"
>     (set_attr "length" "*,*,8")])
>  
>  
> -;; Vector quad support.  XOmode can only live in FPRs.
> +;; Vector quad support.  Under the original MMA, XOmode can only live in VSX
> +;; registers 0..31.  With dense math, XOmode can live in either VSX registers
> +;; (0..63) or DMF registers.

It should be (0..31), not (0..63).

>  (define_expand "movxo"
>    [(set (match_operand:XO 0 "nonimmediate_operand")
>       (match_operand:XO 1 "input_operand"))]
> @@ -338,10 +341,10 @@ (define_expand "movxo"
>      gcc_assert (false);
>  })
>  
> -(define_insn_and_split "*movxo"
> +(define_insn_and_split "*movxo_nodm"
>    [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
>       (match_operand:XO 1 "input_operand" "ZwO,d,d"))]
> -  "TARGET_MMA
> +  "TARGET_MMA_NO_DENSE_MATH
>     && (gpc_reg_operand (operands[0], XOmode)
>         || gpc_reg_operand (operands[1], XOmode))"
>    "@
> @@ -358,6 +361,31 @@ (define_insn_and_split "*movxo"
>     (set_attr "length" "*,*,16")
>     (set_attr "max_prefixed_insns" "2,2,*")])
>  
> +(define_insn_and_split "*movxo_dm"
> +  [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa")
> +     (match_operand:XO 1 "input_operand"        "ZwO,wa, wa,wa,wD,wD"))]
> +  "TARGET_DENSE_MATH
> +   && (gpc_reg_operand (operands[0], XOmode)
> +       || gpc_reg_operand (operands[1], XOmode))"
> +  "@
> +   #
> +   #
> +   #
> +   dmxxinstdmr512 %0,%1,%Y1,0
> +   dmmr %0,%1
> +   dmxxextfdmr512 %0,%Y0,%1,0"
> +  "&& reload_completed
> +   && !dmf_operand (operands[0], XOmode)
> +   && !dmf_operand (operands[1], XOmode)"
> +  [(const_int 0)]
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma")
> +   (set_attr "length" "*,*,16,*,*,*")
> +   (set_attr "max_prefixed_insns" "2,2,*,*,*,*")])
> +
>  (define_expand "vsx_assemble_pair"
>    [(match_operand:OO 0 "vsx_register_operand")
>     (match_operand:V16QI 1 "mma_assemble_input_operand")
> @@ -456,29 +484,53 @@ (define_expand "mma_disassemble_acc"
>    DONE;
>  })
>  
> -;; MMA instructions that do not use their accumulators as an input, still
> -;; must not allow their vector operands to overlap the registers used by
> -;; the accumulator.  We enforce this by marking the output as early clobber.
> +;; MMA instructions that do not use their accumulators as an input, still 
> must
> +;; not allow their vector operands to overlap the registers used by the
> +;; accumulator.  We enforce this by marking the output as early clobber.  The
> +;; prime and de-prime instructions are not needed on systems with dense math
> +;; registers.
>  
>  (define_insn "mma_<acc>"
>    [(set (match_operand:XO 0 "accumulator_operand" "=&wD")
> -     (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
> +     (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
>                   MMA_ACC))]
> -  "TARGET_MMA"
> +  "TARGET_MMA_NO_DENSE_MATH"
>    "<acc> %A0"
>    [(set_attr "type" "mma")])
>  
>  ;; We can't have integer constants in XOmode so we wrap this in an
> -;; UNSPEC_VOLATILE.
> +;; UNSPEC_VOLATILE.  If we have dense math registers, we can just use a 
> normal
> +;; UNSPEC instead of UNSPEC_VOLATILE.
>  
> -(define_insn "mma_xxsetaccz"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +(define_expand "mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "accumulator_operand")
>       (unspec_volatile:XO [(const_int 0)]
>                           UNSPECV_MMA_XXSETACCZ))]
>    "TARGET_MMA"
> +{
> +  if (TARGET_DENSE_MATH)
> +    {
> +      emit_insn (gen_mma_dmsetdmrz (operands[0]));
> +      DONE;
> +    }
> +})
> +
> +(define_insn "*mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +     (unspec_volatile:XO [(const_int 0)]
> +                         UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_MMA_NO_DENSE_MATH"
>    "xxsetaccz %A0"
>    [(set_attr "type" "mma")])
>  
> +(define_insn "mma_dmsetdmrz"
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD")
> +     (unspec [(const_int 0)]
> +             UNSPEC_MMA_DMSETDMRZ))]
> +  "TARGET_DENSE_MATH"
> +  "dmsetdmrz %A0"
> +  [(set_attr "type" "mma")])
> +
>  (define_insn "mma_<vv>"
>    [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
>       (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 9f152037222..f1e03ec30c9 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -186,8 +186,23 @@ (define_predicate "vlogical_operand"
>    return VLOGICAL_REGNO_P (REGNO (op));
>  })
>  
> +;; Return 1 if op is a DMF register
> +(define_predicate "dmf_operand"
> +  (match_operand 0 "register_operand")
> +{
> +  if (!REG_P (op))
> +    return 0;
> +
> +  if (!HARD_REGISTER_P (op))
> +    return 1;
> +
> +  return DMF_REGNO_P (REGNO (op));
> +})
> +
>  ;; Return 1 if op is an accumulator.  On power10 systems, the accumulators
> -;; overlap with the FPRs.
> +;; overlap with the FPRs, while on systems with dense math, the accumulators
> +;; are separate dense math registers and do not overlap with the FPR
> +;; registers..
>  (define_predicate "accumulator_operand"
>    (match_operand 0 "register_operand")
>  {
> @@ -198,7 +213,9 @@ (define_predicate "accumulator_operand"
>      return 1;
>  
>    int r = REGNO (op);
> -  return FP_REGNO_P (r) && (r & 3) == 0;
> +  return (TARGET_DENSE_MATH
> +       ? DMF_REGNO_P (r)
> +       : FP_REGNO_P (r) && (r & 3) == 0);
>  })
>  
>  ;; Return 1 if op is the carry register.
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index bc1580f051b..6b7e5686f0c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1125,8 +1125,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
> *gsi,
>       }
>  
>        /* If we're disassembling an accumulator into a different type, we need
> -      to emit a xxmfacc instruction now, since we cannot do it later.  */
> -      if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
> +      to emit a xxmfacc instruction now, since we cannot do it later.  If we
> +      have dense math registers, we don't need to do this.  */
> +      if (fncode == RS6000_BIF_DISASSEMBLE_ACC && !TARGET_DENSE_MATH)
>       {
>         new_decl = rs6000_builtin_decls[RS6000_BIF_XXMFACC_INTERNAL];
>         new_call = gimple_build_call (new_decl, 1, src);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 6757a2477ad..e202fd6c7df 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -587,9 +587,14 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>    if (rs6000_cpu == PROCESSOR_CELL)
>      rs6000_define_or_undefine_macro (define_p, "__PPU__");
>  
> -  /* Tell the user if we support the MMA instructions.  */
> +  /* Tell the user if we support the MMA instructions.  Also tell them if MMA
> +     uses the dense math registers.  */
>    if ((flags & OPTION_MASK_MMA) != 0)
> -    rs6000_define_or_undefine_macro (define_p, "__MMA__");
> +    {
> +      rs6000_define_or_undefine_macro (define_p, "__MMA__");
> +      if ((flags & OPTION_MASK_DENSE_MATH) != 0)
> +     rs6000_define_or_undefine_macro (define_p, "__DENSE_MATH__");
> +    }
>    /* Whether pc-relative code is being generated.  */
>    if ((flags & OPTION_MASK_PCREL) != 0)
>      rs6000_define_or_undefine_macro (define_p, "__PCREL__");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index a0e6745495d..c03b069b779 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -91,6 +91,7 @@
>     will be fixed in potential future machines.  */
>  #define FUTURE_MASKS_SERVER  (POWER11_MASKS_SERVER                   \
>                                | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR    \
> +                              | OPTION_MASK_DENSE_MATH               \
>                                | OPTION_MASK_FUTURE)
>  
>  /* Flags that need to be turned off if -mno-vsx.  */
> @@ -124,6 +125,7 @@
>                                | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR    \
>                                | OPTION_MASK_CMPB                     \
>                                | OPTION_MASK_CRYPTO                   \
> +                              | OPTION_MASK_DENSE_MATH               \
>                                | OPTION_MASK_DFP                      \
>                                | OPTION_MASK_DLMZB                    \
>                                | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index ac95ea05657..570e8a14f2d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -292,7 +292,8 @@ enum rs6000_reg_type {
>    ALTIVEC_REG_TYPE,
>    FPR_REG_TYPE,
>    SPR_REG_TYPE,
> -  CR_REG_TYPE
> +  CR_REG_TYPE,
> +  DMF_REG_TYPE

s/DMF/DMR

>  };
>  
>  /* Map register class to register type.  */
> @@ -306,22 +307,23 @@ static enum rs6000_reg_type 
> reg_class_to_reg_type[N_REG_CLASSES];
>  
>  
>  /* Register classes we care about in secondary reload or go if legitimate
> -   address.  We only need to worry about GPR, FPR, and Altivec registers 
> here,
> -   along an ANY field that is the OR of the 3 register classes.  */
> +   address.  We only need to worry about GPR, FPR, Altivec, and DMF registers
> +   here, along an ANY field that is the OR of the 4 register classes.  */
>  
>  enum rs6000_reload_reg_type {
>    RELOAD_REG_GPR,                    /* General purpose registers.  */
>    RELOAD_REG_FPR,                    /* Traditional floating point regs.  */
>    RELOAD_REG_VMX,                    /* Altivec (VMX) registers.  */
> -  RELOAD_REG_ANY,                    /* OR of GPR, FPR, Altivec masks.  */
> +  RELOAD_REG_DMF,                    /* DMF registers.  */

s/RELOAD_REG_DMF/RELOAD_REG_DMR

> +  RELOAD_REG_ANY,                    /* OR of GPR/FPR/VMX/DMF masks.  */

s/DMF/DMR

>    N_RELOAD_REG
>  };
>  
> -/* For setting up register classes, loop through the 3 register classes 
> mapping
> +/* For setting up register classes, loop through the 4 register classes 
> mapping
>     into real registers, and skip the ANY class, which is just an OR of the
>     bits.  */
>  #define FIRST_RELOAD_REG_CLASS       RELOAD_REG_GPR
> -#define LAST_RELOAD_REG_CLASS        RELOAD_REG_VMX
> +#define LAST_RELOAD_REG_CLASS        RELOAD_REG_DMF
>  
>  /* Map reload register type to a register in the register class.  */
>  struct reload_reg_map_type {
> @@ -333,6 +335,7 @@ static const struct reload_reg_map_type 
> reload_reg_map[N_RELOAD_REG] = {
>    { "Gpr",   FIRST_GPR_REGNO },      /* RELOAD_REG_GPR.  */
>    { "Fpr",   FIRST_FPR_REGNO },      /* RELOAD_REG_FPR.  */
>    { "VMX",   FIRST_ALTIVEC_REGNO },  /* RELOAD_REG_VMX.  */
> +  { "DMF",   FIRST_DMF_REGNO },      /* RELOAD_REG_DMF.  */

s/DMF/DMR

>    { "Any",   -1 },                   /* RELOAD_REG_ANY.  */
>  };
>  
> @@ -1226,6 +1229,8 @@ char rs6000_reg_names[][8] =
>        "0",  "1",  "2",  "3",  "4",  "5",  "6",  "7",
>    /* vrsave vscr sfp */
>        "vrsave", "vscr", "sfp",
> +  /* DMFs */

s/DMF/DMR

> +      "0", "1", "2", "3", "4", "5", "6", "7",
>  };
>  
>  #ifdef TARGET_REGNAMES
> @@ -1252,6 +1257,8 @@ static const char alt_reg_names[][8] =
>    "%cr0",  "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7",
>    /* vrsave vscr sfp */
>    "vrsave", "vscr", "sfp",
> +  /* DMFs */

s/DMF/DMR

> +  "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7",
>  };
>  #endif
>  
> @@ -1842,6 +1849,9 @@ `` (int regno, machine_mode mode)
>    else if (ALTIVEC_REGNO_P (regno))
>      reg_size = UNITS_PER_ALTIVEC_WORD;
>  
> +  else if (DMF_REGNO_P (regno))
> +    reg_size = UNITS_PER_DMF_WORD;
> +
>    else
>      reg_size = UNITS_PER_WORD;
>  
> @@ -1863,9 +1873,35 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
> machine_mode mode)
>    if (mode == OOmode)
>      return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
>  
> -  /* MMA accumulator modes need FPR registers divisible by 4.  */
> +  /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible
> +     by 4.
> +
> +     If dense math registers are enabled, we can allow all VSX registers plus
> +     the DMF registers.  VSX registers are used to load and store the 
> registers
> +     as the accumulator registers do not have load and store instructions.
> +     Because we just use the VSX registers for load/store operations, we just
> +     need to make sure load vector pair and store vector pair instructions 
> can
> +     be used.  */`
>    if (mode == XOmode)
> -    return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);
> +    {
> +      if (!TARGET_MMA)
> +     return 0;

We can be using XOmode even if TARGET_MMA is false.

> +
> +      else if (!TARGET_DENSE_MATH)
> +     return (FP_REGNO_P (regno) && (regno & 3) == 0);
> +
> +      else if (DMF_REGNO_P (regno))
> +     return 1;
> +
> +      else
> +     return (VSX_REGNO_P (regno)
> +             && VSX_REGNO_P (last_regno)
> +             && (regno & 1) == 0);
> +    }
> +
> +  /* No other types other than XOmode can go in DMFs.  */
> +  if (DMF_REGNO_P (regno))
> +    return 0;
>  
>    /* PTImode can only go in GPRs.  Quad word memory operations require 
> even/odd
>       register combinations, and use PTImode where we need to deal with quad
> @@ -2308,6 +2344,7 @@ rs6000_debug_reg_global (void)
>    rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO,
>                         LAST_ALTIVEC_REGNO,
>                         "vs");
> +  rs6000_debug_reg_print (FIRST_DMF_REGNO, LAST_DMF_REGNO, "dmf");
>    rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr");
>    rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr");
>    rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr");
> @@ -2634,6 +2671,21 @@ rs6000_setup_reg_addr_masks (void)
>         addr_mask = 0;
>         reg = reload_reg_map[rc].reg;
>  
> +       /* Special case DMF registers.  */
> +       if (rc == RELOAD_REG_DMF)
> +         {
> +           if (TARGET_DENSE_MATH && m2 == XOmode)
> +             {
> +               addr_mask = RELOAD_REG_VALID;
> +               reg_addr[m].addr_mask[rc] = addr_mask;
> +               any_addr_mask |= addr_mask;
> +             }
> +           else
> +             reg_addr[m].addr_mask[rc] = 0;
> +
> +           continue;
> +         }
> +
>         /* Can mode values go in the GPR/FPR/Altivec registers?  */
>         if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
>           {
> @@ -2784,6 +2836,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    for (r = CR1_REGNO; r <= CR7_REGNO; ++r)
>      rs6000_regno_regclass[r] = CR_REGS;
>  
> +  for (r = FIRST_DMF_REGNO; r <= LAST_DMF_REGNO; ++r)
> +    rs6000_regno_regclass[r] = DM_REGS;
> +
>    rs6000_regno_regclass[LR_REGNO] = LINK_REGS;
>    rs6000_regno_regclass[CTR_REGNO] = CTR_REGS;
>    rs6000_regno_regclass[CA_REGNO] = NO_REGS;
> @@ -2808,6 +2863,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
> +  reg_class_to_reg_type[(int)DM_REGS] = DMF_REG_TYPE;
>  
>    if (TARGET_VSX)
>      {
> @@ -2994,8 +3050,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    if (TARGET_DIRECT_MOVE_128)
>      rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
>  
> +  /* Support for the accumulator registers, either FPR registers (aka 
> original
> +     mma) or DMF registers (dense math).  */
>    if (TARGET_MMA)
> -    rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
> +    rs6000_constraints[RS6000_CONSTRAINT_wD]
> +      = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS;
>  
>    /* Set up the reload helper and direct move functions.  */
>    if (TARGET_VSX || TARGET_ALTIVEC)
> @@ -4410,6 +4469,16 @@ rs6000_option_override_internal (bool global_init_p)
>    if (!TARGET_PCREL && TARGET_PCREL_OPT)
>      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
>  
> +  /* Turn off dense math MMA+ options on non-future systems.  */
> +  if (TARGET_DENSE_MATH && (!TARGET_MMA || !TARGET_FUTURE))
> +    {
> +      if ((rs6000_isa_flags_explicit & OPTION_MASK_DENSE_MATH) != 0)
> +     error ("%qs requires %qs", "-mdense-math",
> +            (!TARGET_FUTURE ? "-mcpu=future" : "-mma"));
> +
> +      rs6000_isa_flags &= ~OPTION_MASK_DENSE_MATH;
> +    }
> +
>    if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
>      rs6000_print_isa_options (stderr, 0, "after subtarget", 
> rs6000_isa_flags);
>  
> @@ -12356,6 +12425,11 @@ rs6000_secondary_reload_memory (rtx addr,
>      addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX]
>                & ~RELOAD_REG_AND_M16);
>  
> +  /* DMF registers use VSX registers for memory operations, and need to
> +     generate some extra instructions.  */
> +  else if (rclass == DM_REGS)
> +    return 2;
> +
>    /* If the register allocator hasn't made up its mind yet on the register
>       class to use, settle on defaults to use.  */
>    else if (rclass == NO_REGS)
> @@ -12684,6 +12758,13 @@ rs6000_secondary_reload_simple_move (enum 
> rs6000_reg_type to_type,
>              || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
>      return true;
>  
> +  /* We can transfer between VSX registers and DMF registers without needing
> +     extra registers.  */
> +  if (TARGET_DENSE_MATH && mode == XOmode
> +      && ((to_type == DMF_REG_TYPE && from_type == VSX_REG_TYPE)
> +       || (to_type == VSX_REG_TYPE && from_type == DMF_REG_TYPE)))
> +    return true;
> +
>    return false;
>  }
>  
> @@ -13378,6 +13459,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>    machine_mode mode = GET_MODE (x);
>    bool is_constant = CONSTANT_P (x);
>  
> +  /* DMF registers can't be loaded or stored.  */
> +  if (rclass == DM_REGS)
> +    return NO_REGS;
> +
>    /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a 
> preferred
>       reload class for it.  */
>    if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS)
> @@ -13474,7 +13559,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>       return VSX_REGS;
>  
>        if (mode == XOmode)
> -     return FLOAT_REGS;
> +     return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
>  
>        if (GET_MODE_CLASS (mode) == MODE_INT)
>       return GENERAL_REGS;
> @@ -13599,6 +13684,11 @@ rs6000_secondary_reload_class (enum reg_class 
> rclass, machine_mode mode,
>    else
>      regno = -1;
>  
> +  /* DMF registers don't have loads or stores.  We have to go through the VSX
> +     registers to load XOmode (vector quad).  */
> +  if (TARGET_DENSE_MATH && rclass == DM_REGS)
> +    return VSX_REGS;
> +
>    /* If we have VSX register moves, prefer moving scalar values between
>       Altivec registers and GPR by going via an FPR (and then via memory)
>       instead of reloading the secondary memory address for Altivec moves.  */
> @@ -14130,8 +14220,19 @@ print_operand (FILE *file, rtx x, int code)
>        output_operand.  */
>  
>      case 'A':
> -      /* Write the MMA accumulator number associated with VSX register X.  */
> -      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
> +      /* Write the MMA accumulator number associated with VSX register X.  On
> +      dense math systems, only allow DMF accumulators, not accumulators
> +      overlapping with the FPR registers.  */
> +      if (!REG_P (x))
> +     output_operand_lossage ("invalid %%A value");
> +      else if (TARGET_DENSE_MATH)
> +     {
> +       if (DMF_REGNO_P (REGNO (x)))
> +         fprintf (file, "%d", REGNO (x) - FIRST_DMF_REGNO);
> +       else
> +         output_operand_lossage ("%%A operand is not a DMF");
> +     }
> +      else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
>       output_operand_lossage ("invalid %%A value");
>        else
>       fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
> @@ -22751,6 +22852,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode,
>  }
>  
>  
> +/* Subroutine to determine the move cost of dense math registers.  If we are
> +   moving to/from VSX_REGISTER registers, the cost is either 1 move (for
> +   512-bit accumulators) or 2 moves (for 1,024 dmf registers).  If we are
> +   moving to anything else like GPR registers, make the cost very high.  */
> +
> +static int
> +rs6000_dmf_register_move_cost (machine_mode mode, reg_class_t rclass)
> +{
> +  const int reg_move_base = 2;
> +  HARD_REG_SET vsx_set = (reg_class_contents[rclass]
> +                       & reg_class_contents[VSX_REGS]);
> +
> +  if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set))
> +    {
> +      /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction.  */
> +      if (mode == XOmode)
> +     return reg_move_base;
> +
> +      else
> +     return reg_move_base * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> +    }
> +
> +  return 1000 * 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> +}
> +
>  /* A C expression returning the cost of moving data from a register of class
>     CLASS1 to one of CLASS2.  */
>  
> @@ -22764,17 +22890,28 @@ rs6000_register_move_cost (machine_mode mode,
>    if (TARGET_DEBUG_COST)
>      dbg_cost_ctrl++;
>  
> +  HARD_REG_SET to_vsx, from_vsx;
> +  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> +  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> +
> +  /* Special case DMF registers, that can only move to/from VSX registers.  
> */
> +  if (from == DM_REGS && to == DM_REGS)
> +    ret = 2 * hard_regno_nregs (FIRST_DMF_REGNO, mode);
> +
> +  else if (from == DM_REGS)
> +    ret = rs6000_dmf_register_move_cost (mode, to);
> +
> +  else if (to == DM_REGS)
> +    ret = rs6000_dmf_register_move_cost (mode, from);
> +
>    /* If we have VSX, we can easily move between FPR or Altivec registers,
>       otherwise we can only easily move within classes.
>       Do this first so we give best-case answers for union classes
>       containing both gprs and vsx regs.  */
> -  HARD_REG_SET to_vsx, from_vsx;
> -  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> -  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> -  if (!hard_reg_set_empty_p (to_vsx)
> -      && !hard_reg_set_empty_p (from_vsx)
> -      && (TARGET_VSX
> -       || hard_reg_set_intersect_p (to_vsx, from_vsx)))
> +  else if (!hard_reg_set_empty_p (to_vsx)
> +        && !hard_reg_set_empty_p (from_vsx)
> +        && (TARGET_VSX
> +            || hard_reg_set_intersect_p (to_vsx, from_vsx)))
>      {
>        int reg = FIRST_FPR_REGNO;
>        if (TARGET_VSX
> @@ -22870,6 +23007,9 @@ rs6000_memory_move_cost (machine_mode mode, 
> reg_class_t rclass,
>      ret = 4 * hard_regno_nregs (32, mode);
>    else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
>      ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode);
> +  else if (reg_classes_intersect_p (rclass, DM_REGS))
> +    ret = (rs6000_dmf_register_move_cost (mode, VSX_REGS)
> +        + rs6000_memory_move_cost (mode, VSX_REGS, false));
>    else
>      ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS);
>  
> @@ -24078,6 +24218,8 @@ rs6000_compute_pressure_classes (enum reg_class 
> *pressure_classes)
>        if (TARGET_HARD_FLOAT)
>       pressure_classes[n++] = FLOAT_REGS;
>      }
> +  if (TARGET_DENSE_MATH)
> +    pressure_classes[n++] = DM_REGS;
>    pressure_classes[n++] = CR_REGS;
>    pressure_classes[n++] = SPECIAL_REGS;
>  
> @@ -24242,6 +24384,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned 
> int format)
>      return 67;
>    if (regno == 64)
>      return 64;
> +  /* XXX: This is a guess.  The GCC register number for FIRST_DMF_REGNO is 
> 111,
> +     but the frame pointer regnum uses that.  */
> +  if (DMF_REGNO_P (regno))
> +    return regno - FIRST_DMF_REGNO + 112;
>  
>    gcc_unreachable ();
>  }
> @@ -24463,6 +24609,7 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>                                                               false, true  },
>    { "cmpb",                  OPTION_MASK_CMPB,               false, true  },
>    { "crypto",                        OPTION_MASK_CRYPTO,             false, 
> true  },
> +  { "dense-math",            OPTION_MASK_DENSE_MATH,         false, true  },
>    { "direct-move",           0,                              false, true  },
>    { "dlmzb",                 OPTION_MASK_DLMZB,              false, true  },
>    { "efficient-unaligned-vsx",       OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
> @@ -27480,9 +27627,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>         unsigned offset = 0;
>         unsigned size = GET_MODE_SIZE (reg_mode);
>  
> -       /* If we are reading an accumulator register, we have to
> -          deprime it before we can access it.  */
> -       if (TARGET_MMA
> +       /* If we are reading an accumulator register, we have to deprime it
> +          before we can access it unless we have dense math registers.  */
> +       if (TARGET_MMA_NO_DENSE_MATH
>             && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>           emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27514,9 +27661,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst2, src2));
>           }
>  
> -       /* If we are writing an accumulator register, we have to
> -          prime it after we've written it.  */
> -       if (TARGET_MMA
> +       /* If we are writing an accumulator register, we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (TARGET_MMA_NO_DENSE_MATH
>             && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> @@ -27530,7 +27677,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                     || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE);
>         gcc_assert (REG_P (dst));
>         if (GET_MODE (src) == XOmode)
> -         gcc_assert (FP_REGNO_P (REGNO (dst)));
> +         gcc_assert ((TARGET_DENSE_MATH
> +                      ? VSX_REGNO_P (REGNO (dst))
> +                      : FP_REGNO_P (REGNO (dst))));
>         if (GET_MODE (src) == OOmode)
>           gcc_assert (VSX_REGNO_P (REGNO (dst)));
>  
> @@ -27583,9 +27732,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst_i, op));
>           }
>  
> -       /* We are writing an accumulator register, so we have to
> -          prime it after we've written it.  */
> -       if (GET_MODE (src) == XOmode)
> +       /* We are writing an accumulator register, so we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
>         return;
> @@ -27596,9 +27745,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>    if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
>      {
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA_NO_DENSE_MATH
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27624,9 +27773,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        i * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA_NO_DENSE_MATH
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>      }
> @@ -27761,9 +27910,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>           gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
>       }
>  
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA && REG_P (src)
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA_NO_DENSE_MATH && REG_P (src)
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27793,9 +27942,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        j * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA && REG_P (dst)
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA_NO_DENSE_MATH && REG_P (dst)
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index d1f953630f7..169d81e208e 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -556,6 +556,9 @@ extern int rs6000_vector_align[];
>  #define TARGET_DIRECT_MOVE_64BIT     (TARGET_DIRECT_MOVE             \
>                                        && TARGET_POWERPC64)
>  
> +/* Whether we have MMA support without dense math support.  */
> +#define TARGET_MMA_NO_DENSE_MATH     (TARGET_MMA && !TARGET_DENSE_MATH)
> +
>  /* Inlining allows targets to define the meanings of bits in target_info
>     field of ipa_fn_summary by itself, the used bits for rs6000 are listed
>     below.  */
> @@ -653,6 +656,7 @@ extern unsigned char rs6000_recip_bits[];
>  #define UNITS_PER_FP_WORD 8
>  #define UNITS_PER_ALTIVEC_WORD 16
>  #define UNITS_PER_VSX_WORD 16
> +#define UNITS_PER_DMF_WORD 128
>  
>  /* Type used for ptrdiff_t, as a string used in a declaration.  */
>  #define PTRDIFF_TYPE "int"
> @@ -766,7 +770,7 @@ enum data_align { align_abi, align_opt, align_both };
>     Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame
>     pointer, which is eventually eliminated in favor of SP or FP.  */
>  
> -#define FIRST_PSEUDO_REGISTER 111
> +#define FIRST_PSEUDO_REGISTER 119
>  
>  /* Use standard DWARF numbering for DWARF debugging information.  */
>  #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0)
> @@ -803,7 +807,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     0, 0, 0, 0, 0, 0, 0, 0,                      \
>     /* vrsave vscr sfp */                        \
> -   1, 1, 1                                      \
> +   1, 1, 1,                                     \
> +   /* DMF registers.  */                        \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \
>  }
>  
>  /* Like `CALL_USED_REGISTERS' except this macro doesn't require that
> @@ -827,7 +833,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     1, 1, 0, 0, 0, 1, 1, 1,                      \
>     /* vrsave vscr sfp */                        \
> -   0, 0, 0                                      \
> +   0, 0, 0,                                     \
> +   /* DMF registers.  */                        \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \
>  }
>  
>  #define TOTAL_ALTIVEC_REGS   (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1)
> @@ -864,6 +872,7 @@ enum data_align { align_abi, align_opt, align_both };
>       v2              (not saved; incoming vector arg reg; return value)
>       v19 - v14       (not saved or used for anything)
>       v31 - v20       (saved; order given to save least number)
> +     dmr0 - dmr7     (not saved)
>       vrsave, vscr    (fixed)
>       sfp             (fixed)
>  */
> @@ -906,6 +915,9 @@ enum data_align { align_abi, align_opt, align_both };
>     66,                                                               \
>     83, 82, 81, 80, 79, 78,                                   \
>     95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84,           \
> +   /* DMF registers.  */                                     \
> +   111, 112, 113, 114, 115, 116, 117, 118,                   \
> +   /* Vrsave, vscr, sfp.  */                                 \
>     108, 109,                                                 \
>     110                                                               \
>  }
> @@ -932,6 +944,9 @@ enum data_align { align_abi, align_opt, align_both };
>  /* True if register is a VSX register.  */
>  #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N))
>  
> +/* True if register is a DMF register.  */
> +#define DMF_REGNO_P(N) ((N) >= FIRST_DMF_REGNO && (N) <= LAST_DMF_REGNO)
> +
>  /* Alternate name for any vector register supporting floating point, no 
> matter
>     which instruction set(s) are available.  */
>  #define VFLOAT_REGNO_P(N) \
> @@ -1069,6 +1084,7 @@ enum reg_class
>    FLOAT_REGS,
>    ALTIVEC_REGS,
>    VSX_REGS,
> +  DM_REGS,
>    VRSAVE_REGS,
>    VSCR_REGS,
>    GEN_OR_FLOAT_REGS,
> @@ -1098,6 +1114,7 @@ enum reg_class
>    "FLOAT_REGS",                                                              
> \
>    "ALTIVEC_REGS",                                                    \
>    "VSX_REGS",                                                                
> \
> +  "DM_REGS",                                                         \
>    "VRSAVE_REGS",                                                     \
>    "VSCR_REGS",                                                               
> \
>    "GEN_OR_FLOAT_REGS",                                                       
> \
> @@ -1132,6 +1149,8 @@ enum reg_class
>    { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 },                        
> \
>    /* VSX_REGS.  */                                                   \
>    { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 },                        
> \
> +  /* DM_REGS.  */                                                    \
> +  { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 },                        
> \
>    /* VRSAVE_REGS.  */                                                        
> \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00001000 },                        
> \
>    /* VSCR_REGS.  */                                                  \
> @@ -1159,7 +1178,7 @@ enum reg_class
>    /* CA_REGS.  */                                                    \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000004 },                        
> \
>    /* ALL_REGS.  */                                                   \
> -  { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff }                 \
> +  { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff }                 \
>  }
>  
>  /* The same information, inverted:
> @@ -2060,7 +2079,16 @@ extern char rs6000_reg_names[][8];     /* register 
> names (0 vs. %r0).  */
>    &rs6000_reg_names[108][0], /* vrsave  */                           \
>    &rs6000_reg_names[109][0], /* vscr  */                             \
>                                                                       \
> -  &rs6000_reg_names[110][0]  /* sfp  */                              \
> +  &rs6000_reg_names[110][0], /* sfp  */                              \
> +                                                                     \
> +  &rs6000_reg_names[111][0], /* dmr0  */                             \
> +  &rs6000_reg_names[112][0], /* dmr1  */                             \
> +  &rs6000_reg_names[113][0], /* dmr2  */                             \
> +  &rs6000_reg_names[114][0], /* dmr3  */                             \
> +  &rs6000_reg_names[115][0], /* dmr4  */                             \
> +  &rs6000_reg_names[116][0], /* dmr5  */                             \
> +  &rs6000_reg_names[117][0], /* dmr6  */                             \
> +  &rs6000_reg_names[118][0], /* dmr7  */                             \
>  }
>  
>  /* Table of additional register names to use in user input.  */
> @@ -2114,6 +2142,8 @@ extern char rs6000_reg_names[][8];      /* register 
> names (0 vs. %r0).  */
>    {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87},    \
>    {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91},    \
>    {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95},    \
> +  {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114},        \
> +  {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118},        \
>  }
>  
>  /* This is how to output an element of a case-vector that is relative.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ff085bf9bb1..0717e86e9d6 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -51,6 +51,8 @@ (define_constants
>     (VRSAVE_REGNO             108)
>     (VSCR_REGNO                       109)
>     (FRAME_POINTER_REGNUM     110)
> +   (FIRST_DMF_REGNO          111)
> +   (LAST_DMF_REGNO           118)
>    ])
>  
>  ;;
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 7c4f0375424..72578644037 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,6 +639,10 @@ mfuture
>  Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags) Warn(Do not use 
> %<-mfuture>, use %<-mcpu=future>)
>  Generate (do not generate) potential future instructions.
>  
> +mdense_math
> +Target Mask(DENSE_MATH) Var(rs6000_isa_flags)
> +Generate (do not generate) dense math MMA+ instructions.

The Dense Math registers can be used in MMA+ as well as non-MMA+
instructions. So we need to reword the above line. The -mdense-match
flag should enable/disable use of dense math registers.

-Surya

> +
>  ; Documented parameters
>  
>  -param=rs6000-vect-unroll-limit=

Re: [PATCH V2, 2/3] Add support for dense math registers on a future PowerPC

Reply via email to