Re: [PATCH V3, 2/4] Add support for dense math registers on a future PowerPC

Surya Kumari Jangala Tue, 10 Feb 2026 02:12:58 -0800

Hi Mike,

This is a very big patch, can you please break it up?
We can perhaps have one patch which have the mma.md changes, another patch that
will contain the OPTION_MASK_DENSE_MATH changes etc etc.
More comments are inlined:


On 03/02/26 3:27 am, Michael Meissner wrote:
> The MMA subsystem added the notion of accumulator registers as an optional
> feature of ISA 3.1 (power10 and power11).  In ISA 3.1, these accumulators
> overlapped with the VSX registers 0..31, but logically the accumulator 
> registers
> were separate from the FPR registers.  In ISA 3.1, it was anticipated that in
> future systems, the accumulator registers may no overlap with the FPR 
> registers.
> This patch adds the support for dense math registers as separate registers.
> 
> This patch updates the wD constraint added in the previous patch.  If MMA is
> selected but dense math is not selected (i.e. -mcpu=power10), the wD 
> constraint
> will allow access to accumulators that overlap with VSX registers 0..31.  If
> both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint
> will only allow dense math registers.
> 
> This patch modifies the existing %A output modifier.  If MMA is selected but
> dense math is not selected, then %A output modifier converts the VSX register
> number to the accumulator number, by dividing it by 4.  If both MMA and dense
> math are selected, then %A will map the separate dense math registers into 
> 0..7.
> 
> The intention is that user code using extended asm can be modified to run on
> both MMA without dense math and MMA with dense math:
> 
>     1)        If possible, don't use extended asm, but instead use the MMA 
> built-in
>       functions;
> 
>     2)        If you do need to write extended asm, change the d constraints
>       targetting accumulators should now use wD;
> 
>     3)        Only use the built-in zero, assemble and disassemble functions 
> create
>       move data between vector quad types and dense math accumulators.
>       I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
>       extended asm code.  The reason is these instructions assume there is a
>       1-to-1 correspondence between 4 adjacent FPR registers and an
>       accumulator that overlaps with those instructions.  With accumulators
>       now being separate registers, there no longer is a 1-to-1
>       correspondence.
> 
> It is possible that the mangling for dense math registers and the GDB register
> numbers may need to be changed in the future.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> The differences in this patch from the version posted on November 14th, 2025
> include:
> 
>    1) I updated some comments.
> 
>    2) I deleted the macro TARGET_MMA_NO_DENSE_MATH and I just used TARGET_MMA
>       && !TARGET_DENSE_MATH.

>From the patch, I got the sense that (TARGET_MMA && !TARGET_DENSE_MATH) is 
>being
used to denote power10. This is confusing and non-intuitive. Let us use
TARGET_MMA_P10 (or something similar) to avoid any confusion.

> 
>    3) I changed the predicate dmf_operand to be dense_math_operand.

Let's name the operand as dmf_register_operand, to be in sync with how the
other operands are named such as altivec_register_operand, vsx_register_operand
etc.

> 
>    4) For one set of insns (mma_<acc>) that only are used on systems without
>       dense math, I changed the constraint back from 'wD' to 'd'.

The xxmfacc and xxmtacc instructions can be used on future processors too.
The future processor may probably have the instructions dmxxmtacc and dmxxmfacc
and the instructions xxmtacc and xxmfacc are extended mnemonics. 

> 
>    5) For the accumulator_operand predicate, I added support for subregs.
> 
>    6) I changed the name of the macro DMF_REGNO_P to DM_REGNO_P.

DMF_REGNO_P is fine. Let's not change it.

> 
>    7) I changed the code to depend on just -mdense-math to enable the dense
>       math registers, and not a combination of -mmma and -mdense-math.  The
>       intention is since other things can use the new dense math registers
>       (like some future cryptography instructions).
> 
>    8) I made the following changes to macros and enumerations to talk about
>       dense math registers (DM or DMR) instead of dense math facility
>       (i.e. the changes to MMA that uses the dense math registers):
> 
>           DMF_REG_TYPE       to DM_REG_TYPE

s/DM_REG_TYPE/DMR_REG_TYPE. Just like we have GPR_REG_TYPE.

>           RELOAD_REG_DMF     to RELOAD_REG_DMR
>           UNITS_PER_DMF_WORD to UNITS_PER_DM_WORD

UNITS_PER_DMF_WORD is perfectly fine. Please don't change it.

>           FIRST_DMF_REGNO    to FIRST_DM_REGNO
>           LAST_DMF_REGNO     to LAST_DM_REGNO

Let's have FIRST_DMR_REGNO and LAST_DMR_REGNO. Just like we have 
FIRST_GPR_REGNO.

> 
>    9) I updated the wording for the -mdense-math option, and I documented it
>       in invoke.texi.
> 
> gcc/
> 
> 2026-02-02   Michael Meissner  <[email protected]>
> 
>       * config/rs6000/mma.md (UNSPEC_MMA_DMSETDMRZ): New unspec.
>       (movxo): Move comment about XOmode being restricted to FPRs to
>       movxo_mode.
>       (movxo_nodm): Rename from movxo and restrict the usage to machines
>       without dense math registers.
>       (movxo_dm): New insn for movxo support for machines with dense math
>       registers.
>       (mma_<acc>): Restrict usage to machines without dense math registers.
>       (mma_xxsetaccz): Add a define_expand wrapper, and add support for dense
>       math registers.
>       (mma_dmsetaccz): New insn.

I don't see this change in the patch.

>       (mma_<vv>): Add comment about MMA using or not using dense math
>       registers.
>       * config/rs6000/predicates.md (dense_math_operand): New predicate.
>       (accumulator_operand): Add support for dense math registers.  Add
>       support for SUBREGs.
>       * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do
>       not issue a de-prime instruction when disassembling a vector quad on a
>       system with dense math registers.
>       * config/rs6000/rs6000-c.cc (rs6000_define_or_undefine_macro): Define
>       __DENSE_MATH__ if we have dense math registers.
>       * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): Add -mdense-math.
>       (POWERPC_MASKS): Likewise.
>       * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DM_REG_TYPE.
>       (enum rs6000_reload_reg_type): Add RELOAD_REG_DM.
>       (LAST_RELOAD_REG_CLASS): Add support for dense math registers and the wD
>       constraint.
>       (reload_reg_map): Likewise.
>       (rs6000_reg_names): Likewise.
>       (alt_reg_names): Likewise.
>       (rs6000_hard_regno_nregs_internal): Likewise.
>       (rs6000_hard_regno_mode_ok_uncached): Likewise.
>       (rs6000_debug_reg_global): Likewise.
>       (rs6000_setup_reg_addr_masks): Likewise.
>       (rs6000_init_hard_regno_mode_ok): Likewise.
>       (rs6000_option_override_internal): If -mdense-math, issue an error if
>       not -mcpu=future.
>       (rs6000_secondary_reload_memory): Add support for dense math registers.
>       (rs6000_secondary_reload_simple_move): Likewise.
>       (rs6000_preferred_reload_class): Likewise.
>       (rs6000_secondary_reload_class): Likewise.
>       (print_operand): Make %A handle both dense math registers or FPRs
>       depending on whether dense registers are available.
>       (rs6000_dense_math_register_move_cost): New helper function.
>       (rs6000_register_move_cost): Add support for dense math registers.
>       (rs6000_memory_move_cost): Likewise.
>       (rs6000_compute_pressure_classes): Likewise.
>       (rs6000_debugger_regno): Likewise.
>       (rs6000_opt_masks): Add -mdense-math support.
>       (rs6000_split_multireg_move): Add support for dense math registers.
>       * config/rs6000/rs6000.h (UNITS_PER_DM_WORD): Likewise.
>       (FIRST_PSEUDO_REGISTER): Update for dense math registers.
>       (FIXED_REGISTERS): Add dense math registers.
>       (CALL_REALLY_USED_REGISTERS): Likewise.
>       (REG_ALLOC_ORDER): Likewise.
>       (DM_REGNO_P): New macro.
>       (enum reg_class): Add DM_REGS.
>       (REG_CLASS_NAMES): Likewise.
>       (REG_CLASS_CONTENTS): Likewise.
>       (enum r6000_reg_class_enum): Add RS6000_CONSTRAINT_wD.
>       (REGISTER_NAMES): Add dense math registers.
>       (ADDITIONAL_REGISTER_NAMES): Likewise.
>       * config/rs6000/rs6000.md (FIRST_DM_REGNO): New constant.
>       (LAST_DM_REGNO): Likewise.
>       * config/rs6000/rs6000.opt (-mdense-math): New option.
>       * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mdense-math.
> ---
>  gcc/config/rs6000/mma.md            |  88 +++++++++--
>  gcc/config/rs6000/predicates.md     |  29 +++-
>  gcc/config/rs6000/rs6000-builtin.cc |   5 +-
>  gcc/config/rs6000/rs6000-c.cc       |   4 +
>  gcc/config/rs6000/rs6000-cpus.def   |   2 +
>  gcc/config/rs6000/rs6000.cc         | 229 +++++++++++++++++++++++-----
>  gcc/config/rs6000/rs6000.h          |  37 ++++-
>  gcc/config/rs6000/rs6000.md         |   2 +
>  gcc/config/rs6000/rs6000.opt        |   4 +
>  gcc/doc/invoke.texi                 |   7 +
>  10 files changed, 344 insertions(+), 63 deletions(-)
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index de79b232eb1..72037161ac9 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -90,6 +90,7 @@ (define_c_enum "unspec"
>     UNSPEC_MMA_XVI8GER4SPP
>     UNSPEC_MMA_XXMFACC
>     UNSPEC_MMA_XXMTACC
> +   UNSPEC_MMA_DMSETDMRZ
>    ])
>  
>  (define_c_enum "unspecv"
> @@ -313,7 +314,7 @@ (define_insn_and_split "*movoo"
>     (set_attr "length" "*,*,8")])
>  
>  
> -;; Vector quad support.  XOmode can only live in FPRs.
> +;; Vector quad support.
>  (define_expand "movxo"
>    [(set (match_operand:XO 0 "nonimmediate_operand")
>       (match_operand:XO 1 "input_operand"))]
> @@ -338,10 +339,13 @@ (define_expand "movxo"
>      gcc_assert (false);
>  })
>  
> -(define_insn_and_split "*movxo"
> +;; If we do not have dense math registers, XOmode can only live in FPR
> +;; registers (0..31).
> +
> +(define_insn_and_split "*movxo_nodm"
>    [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
>       (match_operand:XO 1 "input_operand" "ZwO,d,d"))]
> -  "TARGET_MMA
> +  "TARGET_MMA && !TARGET_DENSE_MATH

It would be correct here to just say TARGET_MMA_P10 or some such.

>     && (gpc_reg_operand (operands[0], XOmode)
>         || gpc_reg_operand (operands[1], XOmode))"
>    "@
> @@ -358,6 +362,34 @@ (define_insn_and_split "*movxo"
>     (set_attr "length" "*,*,16")
>     (set_attr "max_prefixed_insns" "2,2,*")])
>  
> +;; If dense math registers are available, XOmode can live in either VSX
> +;; registers (0..63) or dense math registers.
> +
> +(define_insn_and_split "*movxo_dm"
> +  [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa")
> +     (match_operand:XO 1 "input_operand"        "ZwO,wa, wa,wa,wD,wD"))]
> +  "TARGET_DENSE_MATH
> +   && (gpc_reg_operand (operands[0], XOmode)
> +       || gpc_reg_operand (operands[1], XOmode))"
> +  "@
> +   #
> +   #
> +   #
> +   dmxxinstdmr512 %0,%1,%Y1,0
> +   dmmr %0,%1
> +   dmxxextfdmr512 %0,%Y0,%1,0"
> +  "&& reload_completed
> +   && !dense_math_operand (operands[0], XOmode)
> +   && !dense_math_operand (operands[1], XOmode)"
> +  [(const_int 0)]
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma")
> +   (set_attr "length" "*,*,16,*,*,*")
> +   (set_attr "max_prefixed_insns" "2,2,*,*,*,*")])
> +
>  (define_expand "vsx_assemble_pair"
>    [(match_operand:OO 0 "vsx_register_operand")
>     (match_operand:V16QI 1 "mma_assemble_input_operand")
> @@ -456,29 +488,61 @@ (define_expand "mma_disassemble_acc"
>    DONE;
>  })
>  
> -;; MMA instructions that do not use their accumulators as an input, still
> -;; must not allow their vector operands to overlap the registers used by
> -;; the accumulator.  We enforce this by marking the output as early clobber.
> +;; If dense math registers are not available, MMA instructions that do
> +;; not use their accumulators that overlap with FPR registers as an
> +;; input, still must not allow their vector operands to overlap the
> +;; registers used by the accumulator.  We enforce this by marking the
> +;; output as early clobber.  The prime and de-prime instructions are
> +;; not needed on systems with dense math registers.
>  
>  (define_insn "mma_<acc>"
> -  [(set (match_operand:XO 0 "accumulator_operand" "=&wD")
> -     (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=&d")
> +     (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
>                   MMA_ACC))]
> -  "TARGET_MMA"
> +  "TARGET_MMA && !TARGET_DENSE_MATH"

The xxmfacc and xxmtacc instructions can be used on future processors too.
The future processor may probably have the instructions dmxxmtacc and dmxxmfacc
and the instructions xxmtacc and xxmfacc are extended mnemonics. 

>    "<acc> %A0"
>    [(set_attr "type" "mma")])
>  
>  ;; We can't have integer constants in XOmode so we wrap this in an
> -;; UNSPEC_VOLATILE.
> +;; UNSPEC_VOLATILE.  If we have dense math registers, we can just use a 
> normal
> +;; UNSPEC instead of UNSPEC_VOLATILE.
>  
> -(define_insn "mma_xxsetaccz"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +(define_expand "mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "accumulator_operand")
>       (unspec_volatile:XO [(const_int 0)]
>                           UNSPECV_MMA_XXSETACCZ))]
>    "TARGET_MMA"

The 'set accumulator to zero' instruction does not depend on MMA.
We should remove TARGET_MMA.

> +{
> +  if (TARGET_DENSE_MATH)
> +    {
> +      emit_insn (gen_mma_dmsetdmrz (operands[0]));

The future processor has an instruction dmsetaccz for which the
extended mnemonic is xxsetaccz.

'dmsetaccz' zeroes out 512 bits while dmsetdmrz zeroes out all the
1024 bits. We should emit dmsetaccz here.

> +      DONE;
> +    }
> +})
> +
> +;; Clear accumulator without dense math registers
> +(define_insn "*mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +     (unspec_volatile:XO [(const_int 0)]
> +                         UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_MMA && !TARGET_DENSE_MATH"
>    "xxsetaccz %A0"

xxsetaccz is available as an extended mnemonic on a future processor.

>    [(set_attr "type" "mma")])
>  
> +;; Clear accumulator when dense math registers are available.
> +(define_insn "mma_dmsetdmrz"
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD")
> +     (unspec [(const_int 0)]
> +             UNSPEC_MMA_DMSETDMRZ))]
> +  "TARGET_DENSE_MATH"
> +  "dmsetdmrz %A0"
> +  [(set_attr "type" "mma")])
> +
> +;; MMA operations below.  If dense math registers are available, these
> +;; operations will use the 8 accumultors which are separate registers.
> +;; If dense math registers are not available, these operations will use
> +;; accumulators that are overlaid on top of the FPR registers.

This comment is incorrect.
On future processor that supports dense math registers, if the dense math
facility is turned off (perhaps by unsetting a bit in the MSR), then any use
of MMA instructions is an error. If the dense math facility is turned off, the
accumulator registers are not overlaid on the FPR registers.
On Power10, the accumulators are overlaid on the FPR registers.

> +
>  (define_insn "mma_<vv>"
>    [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
>       (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 8f296ec00b7..5de81d54507 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -186,11 +186,32 @@ (define_predicate "vlogical_operand"
>    return VLOGICAL_REGNO_P (REGNO (op));
>  })
>  
> -;; Return 1 if op is an accumulator.  On power10 systems, the accumulators
> -;; overlap with the FPRs.
> +;; Return 1 if op is a dense math register
> +(define_predicate "dense_math_operand"
> +  (match_operand 0 "register_operand")
> +{
> +  if (!REG_P (op))
> +    return 0;
> +
> +  if (!HARD_REGISTER_P (op))
> +    return 1;
> +
> +  return DM_REGNO_P (REGNO (op));
> +})
> +
> +;; Return 1 if op is an accumulator.
> +;;
> +;; On power10 and power11 systems, the accumulators overlap with the
> +;; FPRs and the register must be divisible by 4.
> +;;
> +;; On systems with dense math registers, the accumulators are separate
> +;; registers and do not overlap with the FPR registers.
>  (define_predicate "accumulator_operand"
>    (match_operand 0 "register_operand")
>  {
> +  if (SUBREG_P (op))
> +    op = SUBREG_REG (op);
> +
>    if (!REG_P (op))
>      return 0;
>  
> @@ -198,7 +219,9 @@ (define_predicate "accumulator_operand"
>      return 1;
>  
>    int r = REGNO (op);
> -  return FP_REGNO_P (r) && (r & 3) == 0;
> +  return (TARGET_DENSE_MATH
> +       ? DM_REGNO_P (r)
> +       : FP_REGNO_P (r) && (r & 3) == 0);
>  })
>  
>  ;; Return 1 if op is the carry register.
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 45c88fe063b..084eaab5b96 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1125,8 +1125,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
> *gsi,
>       }
>  
>        /* If we're disassembling an accumulator into a different type, we need
> -      to emit a xxmfacc instruction now, since we cannot do it later.  */
> -      if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
> +      to emit a xxmfacc instruction now, since we cannot do it later.  If we
> +      have dense math registers, we don't need to do this.  */

This comment is not clear. Can you please clarify why we don't need to do this
if we have dense math registers?

> +      if (fncode == RS6000_BIF_DISASSEMBLE_ACC && !TARGET_DENSE_MATH)
>       {
>         new_decl = rs6000_builtin_decls[RS6000_BIF_XXMFACC_INTERNAL];
>         new_call = gimple_build_call (new_decl, 1, src);
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index eb6a881aa9b..a7eb951b014 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -590,6 +590,10 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>    /* Tell the user if we support the MMA instructions.  */
>    if ((flags & OPTION_MASK_MMA) != 0)
>      rs6000_define_or_undefine_macro (define_p, "__MMA__");
> +  /* Tell the user if we support the dense math registers for use with MMA 
> and
> +     cryptography.  */
> +  if ((flags & OPTION_MASK_DENSE_MATH) != 0)
> +    rs6000_define_or_undefine_macro (define_p, "__DENSE_MATH__");
>    /* Whether pc-relative code is being generated.  */
>    if ((flags & OPTION_MASK_PCREL) != 0)
>      rs6000_define_or_undefine_macro (define_p, "__PCREL__");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index dc67e287672..3e51848481f 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -91,6 +91,7 @@
>     will be fixed in potential future machines.  */
>  #define FUTURE_MASKS_SERVER  (POWER11_MASKS_SERVER                   \
>                                | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR    \
> +                              | OPTION_MASK_DENSE_MATH               \
>                                | OPTION_MASK_FUTURE)
>  
>  /* Flags that need to be turned off if -mno-vsx.  */
> @@ -124,6 +125,7 @@
>                                | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR    \
>                                | OPTION_MASK_CMPB                     \
>                                | OPTION_MASK_CRYPTO                   \
> +                              | OPTION_MASK_DENSE_MATH               \
>                                | OPTION_MASK_DFP                      \
>                                | OPTION_MASK_DLMZB                    \
>                                | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 388ed591b97..131ac9902ca 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -292,7 +292,8 @@ enum rs6000_reg_type {
>    ALTIVEC_REG_TYPE,
>    FPR_REG_TYPE,
>    SPR_REG_TYPE,
> -  CR_REG_TYPE
> +  CR_REG_TYPE,
> +  DM_REG_TYPE

s/DM_REG_TYPE/DMR_REG_TYPE

>  };
>  
>  /* Map register class to register type.  */
> @@ -306,22 +307,24 @@ static enum rs6000_reg_type 
> reg_class_to_reg_type[N_REG_CLASSES];
>  
>  
>  /* Register classes we care about in secondary reload or go if legitimate
> -   address.  We only need to worry about GPR, FPR, and Altivec registers 
> here,
> -   along an ANY field that is the OR of the 3 register classes.  */
> +   address.  We only need to worry about GPR, FPR, Altivec, and dense math
> +   registers here, along an ANY field that is the OR of the 4 register
> +   classes.  */
>  
>  enum rs6000_reload_reg_type {
>    RELOAD_REG_GPR,                    /* General purpose registers.  */
>    RELOAD_REG_FPR,                    /* Traditional floating point regs.  */
>    RELOAD_REG_VMX,                    /* Altivec (VMX) registers.  */
> -  RELOAD_REG_ANY,                    /* OR of GPR, FPR, Altivec masks.  */
> +  RELOAD_REG_DMR,                    /* Dense math registers.  */
> +  RELOAD_REG_ANY,                    /* OR of GPR/FPR/VMX/DMR masks.  */
>    N_RELOAD_REG
>  };
>  
> -/* For setting up register classes, loop through the 3 register classes 
> mapping
> +/* For setting up register classes, loop through the 4 register classes 
> mapping
>     into real registers, and skip the ANY class, which is just an OR of the
>     bits.  */
>  #define FIRST_RELOAD_REG_CLASS       RELOAD_REG_GPR
> -#define LAST_RELOAD_REG_CLASS        RELOAD_REG_VMX
> +#define LAST_RELOAD_REG_CLASS        RELOAD_REG_DMR
>  
>  /* Map reload register type to a register in the register class.  */
>  struct reload_reg_map_type {
> @@ -333,6 +336,7 @@ static const struct reload_reg_map_type 
> reload_reg_map[N_RELOAD_REG] = {
>    { "Gpr",   FIRST_GPR_REGNO },      /* RELOAD_REG_GPR.  */
>    { "Fpr",   FIRST_FPR_REGNO },      /* RELOAD_REG_FPR.  */
>    { "VMX",   FIRST_ALTIVEC_REGNO },  /* RELOAD_REG_VMX.  */
> +  { "Dmr",   FIRST_DM_REGNO },       /* RELOAD_REG_DMR.  */
>    { "Any",   -1 },                   /* RELOAD_REG_ANY.  */
>  };
>  
> @@ -1226,6 +1230,8 @@ char rs6000_reg_names[][8] =
>        "0",  "1",  "2",  "3",  "4",  "5",  "6",  "7",
>    /* vrsave vscr sfp */
>        "vrsave", "vscr", "sfp",
> +  /* dense math registers.  */
> +      "0", "1", "2", "3", "4", "5", "6", "7",
>  };
>  
>  #ifdef TARGET_REGNAMES
> @@ -1252,6 +1258,8 @@ static const char alt_reg_names[][8] =
>    "%cr0",  "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7",
>    /* vrsave vscr sfp */
>    "vrsave", "vscr", "sfp",
> +  /* dense math registers.  */
> +  "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7",
>  };
>  #endif
>  
> @@ -1842,6 +1850,9 @@ rs6000_hard_regno_nregs_internal (int regno, 
> machine_mode mode)
>    else if (ALTIVEC_REGNO_P (regno))
>      reg_size = UNITS_PER_ALTIVEC_WORD;
>  
> +  else if (DM_REGNO_P (regno))
> +    reg_size = UNITS_PER_DM_WORD;
> +
>    else
>      reg_size = UNITS_PER_WORD;
>  
> @@ -1863,9 +1874,32 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
> machine_mode mode)
>    if (mode == OOmode)
>      return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
>  
> -  /* MMA accumulator modes need FPR registers divisible by 4.  */
> +  /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible
> +     by 4.
> +
> +     If dense math registers are enabled, we can allow all VSX registers plus
> +     the dense math registers.  VSX registers are used to load and store the
> +     registers as the accumulator registers do not have load and store
> +     instructions.  Because we just use the VSX registers for load/store
> +     operations, we just need to make sure load vector pair and store vector
> +     pair instructions can be used.  */
>    if (mode == XOmode)
> -    return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);
> +    {
> +      if (!TARGET_DENSE_MATH)

Here, we are assuming that !TARGET_DENSE_MATH means -mcpu is power10.
However, for -mcpu=future, we can turn off the dense math facility. 
Any use of MMA instructions in this case will throw an error.

> +     return (FP_REGNO_P (regno) && (regno & 3) == 0);
> +
> +      else if (DM_REGNO_P (regno))
> +     return 1;
> +
> +      else
> +     return (VSX_REGNO_P (regno)
> +             && VSX_REGNO_P (last_regno)
> +             && (regno & 1) == 0);
> +    }
> +
> +  /* No other types other than XOmode can go in dense math registers.  */
> +  if (DM_REGNO_P (regno))
> +    return 0;
>  
>    /* PTImode can only go in GPRs.  Quad word memory operations require 
> even/odd
>       register combinations, and use PTImode where we need to deal with quad
> @@ -2308,6 +2342,7 @@ rs6000_debug_reg_global (void)
>    rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO,
>                         LAST_ALTIVEC_REGNO,
>                         "vs");
> +  rs6000_debug_reg_print (FIRST_DM_REGNO, LAST_DM_REGNO, "dense_math");

s/dense_math/dm

>    rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr");
>    rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr");
>    rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr");
> @@ -2634,6 +2669,21 @@ rs6000_setup_reg_addr_masks (void)
>         addr_mask = 0;
>         reg = reload_reg_map[rc].reg;
>  
> +       /* Special case dense math registers.  */
> +       if (rc == RELOAD_REG_DMR)
> +         {
> +           if (TARGET_DENSE_MATH && m2 == XOmode)
> +             {
> +               addr_mask = RELOAD_REG_VALID;
> +               reg_addr[m].addr_mask[rc] = addr_mask;
> +               any_addr_mask |= addr_mask;
> +             }
> +           else
> +             reg_addr[m].addr_mask[rc] = 0;
> +
> +           continue;
> +         }
> +
>         /* Can mode values go in the GPR/FPR/Altivec registers?  */
>         if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
>           {
> @@ -2784,6 +2834,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    for (r = CR1_REGNO; r <= CR7_REGNO; ++r)
>      rs6000_regno_regclass[r] = CR_REGS;
>  
> +  for (r = FIRST_DM_REGNO; r <= LAST_DM_REGNO; ++r)
> +    rs6000_regno_regclass[r] = DM_REGS;
> +
>    rs6000_regno_regclass[LR_REGNO] = LINK_REGS;
>    rs6000_regno_regclass[CTR_REGNO] = CTR_REGS;
>    rs6000_regno_regclass[CA_REGNO] = NO_REGS;
> @@ -2808,6 +2861,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
> +  reg_class_to_reg_type[(int)DM_REGS] = DM_REG_TYPE;
>  
>    if (TARGET_VSX)
>      {
> @@ -2994,8 +3048,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    if (TARGET_DIRECT_MOVE_128)
>      rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
>  
> +  /* Support for the accumulator registers, either FPR registers (aka 
> original
> +     mma) or dense math registers.  */
>    if (TARGET_MMA)
> -    rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
> +    rs6000_constraints[RS6000_CONSTRAINT_wD]
> +      = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS;
>  
>    /* Set up the reload helper and direct move functions.  */
>    if (TARGET_VSX || TARGET_ALTIVEC)
> @@ -4410,6 +4467,15 @@ rs6000_option_override_internal (bool global_init_p)
>    if (!TARGET_PCREL && TARGET_PCREL_OPT)
>      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
>  
> +  /* Turn off dense math register support on non-future systems.  */
> +  if (TARGET_DENSE_MATH && !TARGET_FUTURE)
> +    {
> +      if ((rs6000_isa_flags_explicit & OPTION_MASK_DENSE_MATH) != 0)
> +     error ("%qs requires %qs", "-mdense-math", "-mcpu=future");
> +
> +      rs6000_isa_flags &= ~OPTION_MASK_DENSE_MATH;
> +    }
> +

We should also flag an error for the following case:
TARGET_FUTURE && TARGET_MMA && !TARGET_DENSE_MATH

>    if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
>      rs6000_print_isa_options (stderr, 0, "after subtarget", 
> rs6000_isa_flags);
>  
> @@ -12356,6 +12422,11 @@ rs6000_secondary_reload_memory (rtx addr,
>      addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX]
>                & ~RELOAD_REG_AND_M16);
>  
> +  /* Dense math registers use VSX registers for memory operations, and need 
> to
> +     generate some extra instructions.  */
> +  else if (rclass == DM_REGS)
> +    return 2;
> +
>    /* If the register allocator hasn't made up its mind yet on the register
>       class to use, settle on defaults to use.  */
>    else if (rclass == NO_REGS)
> @@ -12684,6 +12755,13 @@ rs6000_secondary_reload_simple_move (enum 
> rs6000_reg_type to_type,
>              || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
>      return true;
>  
> +  /* We can transfer between VSX registers and dense math registers without
> +     needing extra registers.  */
> +  if (TARGET_DENSE_MATH && mode == XOmode
> +      && ((to_type == DM_REG_TYPE && from_type == VSX_REG_TYPE)
> +       || (to_type == VSX_REG_TYPE && from_type == DM_REG_TYPE)))
> +    return true;
> +
>    return false;
>  }
>  
> @@ -13378,6 +13456,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>    machine_mode mode = GET_MODE (x);
>    bool is_constant = CONSTANT_P (x);
>  
> +  /* Dense math registers can't be loaded or stored.  */
> +  if (rclass == DM_REGS)
> +    return NO_REGS;
> +
>    /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a 
> preferred
>       reload class for it.  */
>    if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS)
> @@ -13474,7 +13556,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>       return VSX_REGS;
>  
>        if (mode == XOmode)
> -     return FLOAT_REGS;
> +     return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
>  
>        if (GET_MODE_CLASS (mode) == MODE_INT)
>       return GENERAL_REGS;
> @@ -13599,6 +13681,11 @@ rs6000_secondary_reload_class (enum reg_class 
> rclass, machine_mode mode,
>    else
>      regno = -1;
>  
> +  /* Dense math registers don't have loads or stores.  We have to go through
> +     the VSX registers to load XOmode (vector quad).  */
> +  if (TARGET_DENSE_MATH && rclass == DM_REGS)
> +    return VSX_REGS;
> +
>    /* If we have VSX register moves, prefer moving scalar values between
>       Altivec registers and GPR by going via an FPR (and then via memory)
>       instead of reloading the secondary memory address for Altivec moves.  */
> @@ -14130,8 +14217,20 @@ print_operand (FILE *file, rtx x, int code)
>        output_operand.  */
>  
>      case 'A':
> -      /* Write the MMA accumulator number associated with VSX register X.  */
> -      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
> +      /* Write the MMA accumulator number associated with VSX register X.  On
> +      dense math systems, only allow dense math accumulators, not
> +      accumulators overlapping with the FPR registers.  */
> +      if (!REG_P (x))
> +     output_operand_lossage ("invalid %%A value");
> +      else if (TARGET_DENSE_MATH)
> +     {
> +       if (DM_REGNO_P (REGNO (x)))
> +         fprintf (file, "%d", REGNO (x) - FIRST_DM_REGNO);
> +       else
> +         output_operand_lossage ("%%A operand is not a "
> +                                 "dense math register");
> +     }
> +      else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
>       output_operand_lossage ("invalid %%A value");
>        else
>       fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
> @@ -22751,6 +22850,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode,
>  }
>  
>  
> +/* Subroutine to determine the move cost of dense math registers.  If we are
> +   moving to/from VSX_REGISTER registers, the cost is either 1 move (for
> +   512-bit accumulators) or 2 moves (for 1,024 dense math registers).  If we 
> are
> +   moving to anything else like GPR registers, make the cost very high.  */
> +
> +static int
> +rs6000_dense_math_register_move_cost (machine_mode mode, reg_class_t rclass)
> +{
> +  const int reg_move_base = 2;
> +  HARD_REG_SET vsx_set = (reg_class_contents[rclass]
> +                       & reg_class_contents[VSX_REGS]);
> +
> +  if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set))
> +    {
> +      /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction.  */
> +      if (mode == XOmode)
> +     return reg_move_base;
> +
> +      else
> +     return reg_move_base * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +    }
> +
> +  return 1000 * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +}
> +
>  /* A C expression returning the cost of moving data from a register of class
>     CLASS1 to one of CLASS2.  */
>  
> @@ -22764,17 +22888,28 @@ rs6000_register_move_cost (machine_mode mode,
>    if (TARGET_DEBUG_COST)
>      dbg_cost_ctrl++;
>  
> +  HARD_REG_SET to_vsx, from_vsx;
> +  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> +  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> +
> +  /* Special case dense math registers, that can only move to/from VSX 
> registers.  */
> +  if (from == DM_REGS && to == DM_REGS)
> +    ret = 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +
> +  else if (from == DM_REGS)
> +    ret = rs6000_dense_math_register_move_cost (mode, to);
> +
> +  else if (to == DM_REGS)
> +    ret = rs6000_dense_math_register_move_cost (mode, from);
> +
>    /* If we have VSX, we can easily move between FPR or Altivec registers,
>       otherwise we can only easily move within classes.
>       Do this first so we give best-case answers for union classes
>       containing both gprs and vsx regs.  */
> -  HARD_REG_SET to_vsx, from_vsx;
> -  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> -  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> -  if (!hard_reg_set_empty_p (to_vsx)
> -      && !hard_reg_set_empty_p (from_vsx)
> -      && (TARGET_VSX
> -       || hard_reg_set_intersect_p (to_vsx, from_vsx)))
> +  else if (!hard_reg_set_empty_p (to_vsx)
> +        && !hard_reg_set_empty_p (from_vsx)
> +        && (TARGET_VSX
> +            || hard_reg_set_intersect_p (to_vsx, from_vsx)))
>      {
>        int reg = FIRST_FPR_REGNO;
>        if (TARGET_VSX
> @@ -22870,6 +23005,9 @@ rs6000_memory_move_cost (machine_mode mode, 
> reg_class_t rclass,
>      ret = 4 * hard_regno_nregs (32, mode);
>    else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
>      ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode);
> +  else if (reg_classes_intersect_p (rclass, DM_REGS))
> +    ret = (rs6000_dense_math_register_move_cost (mode, VSX_REGS)
> +        + rs6000_memory_move_cost (mode, VSX_REGS, false));
>    else
>      ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS);
>  
> @@ -24078,6 +24216,8 @@ rs6000_compute_pressure_classes (enum reg_class 
> *pressure_classes)
>        if (TARGET_HARD_FLOAT)
>       pressure_classes[n++] = FLOAT_REGS;
>      }
> +  if (TARGET_DENSE_MATH)
> +    pressure_classes[n++] = DM_REGS;
>    pressure_classes[n++] = CR_REGS;
>    pressure_classes[n++] = SPECIAL_REGS;
>  
> @@ -24242,6 +24382,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned 
> int format)
>      return 67;
>    if (regno == 64)
>      return 64;
> +  /* XXX: This is a guess.  The GCC register number for FIRST_DM_REGNO is 
> 111,
> +     but the frame pointer regnum uses that.  */
> +  if (DM_REGNO_P (regno))
> +    return regno - FIRST_DM_REGNO + 112;
>  
>    gcc_unreachable ();
>  }
> @@ -24463,6 +24607,7 @@ static struct rs6000_opt_mask const 
> rs6000_opt_masks[] =
>                                                               false, true  },
>    { "cmpb",                  OPTION_MASK_CMPB,               false, true  },
>    { "crypto",                        OPTION_MASK_CRYPTO,             false, 
> true  },
> +  { "dense-math",            OPTION_MASK_DENSE_MATH,         false, true  },
>    { "direct-move",           0,                              false, true  },
>    { "dlmzb",                 OPTION_MASK_DLMZB,              false, true  },
>    { "efficient-unaligned-vsx",       OPTION_MASK_EFFICIENT_UNALIGNED_VSX,
> @@ -27480,9 +27625,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>         unsigned offset = 0;
>         unsigned size = GET_MODE_SIZE (reg_mode);
>  
> -       /* If we are reading an accumulator register, we have to
> -          deprime it before we can access it.  */
> -       if (TARGET_MMA
> +       /* If we are reading an accumulator register, we have to deprime it
> +          before we can access it unless we have dense math registers.  */
> +       if (TARGET_MMA && !TARGET_DENSE_MATH
>             && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>           emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27514,9 +27659,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst2, src2));
>           }
>  
> -       /* If we are writing an accumulator register, we have to
> -          prime it after we've written it.  */
> -       if (TARGET_MMA
> +       /* If we are writing an accumulator register, we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (TARGET_MMA && !TARGET_DENSE_MATH
>             && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> @@ -27530,7 +27675,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                     || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE);
>         gcc_assert (REG_P (dst));
>         if (GET_MODE (src) == XOmode)
> -         gcc_assert (FP_REGNO_P (REGNO (dst)));
> +         gcc_assert ((TARGET_DENSE_MATH
> +                      ? VSX_REGNO_P (REGNO (dst))
> +                      : FP_REGNO_P (REGNO (dst))));
>         if (GET_MODE (src) == OOmode)
>           gcc_assert (VSX_REGNO_P (REGNO (dst)));
>  
> @@ -27583,9 +27730,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst_i, op));
>           }
>  
> -       /* We are writing an accumulator register, so we have to
> -          prime it after we've written it.  */
> -       if (GET_MODE (src) == XOmode)
> +       /* We are writing an accumulator register, so we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
>         return;
> @@ -27596,9 +27743,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>    if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
>      {
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27624,9 +27771,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        i * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>      }
> @@ -27761,9 +27908,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>           gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
>       }
>  
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA && REG_P (src)
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src)
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27793,9 +27940,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        j * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA && REG_P (dst)
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst)
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 04709f0dcd6..5214a7c22ce 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -653,6 +653,7 @@ extern unsigned char rs6000_recip_bits[];
>  #define UNITS_PER_FP_WORD 8
>  #define UNITS_PER_ALTIVEC_WORD 16
>  #define UNITS_PER_VSX_WORD 16
> +#define UNITS_PER_DM_WORD 128
>  
>  /* Type used for ptrdiff_t, as a string used in a declaration.  */
>  #define PTRDIFF_TYPE "int"
> @@ -766,7 +767,7 @@ enum data_align { align_abi, align_opt, align_both };
>     Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame
>     pointer, which is eventually eliminated in favor of SP or FP.  */
>  
> -#define FIRST_PSEUDO_REGISTER 111
> +#define FIRST_PSEUDO_REGISTER 119
>  
>  /* Use standard DWARF numbering for DWARF debugging information.  */
>  #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0)
> @@ -803,7 +804,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     0, 0, 0, 0, 0, 0, 0, 0,                      \
>     /* vrsave vscr sfp */                        \
> -   1, 1, 1                                      \
> +   1, 1, 1,                                     \
> +   /* Dense math registers.  */                         \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \
>  }
>  
>  /* Like `CALL_USED_REGISTERS' except this macro doesn't require that
> @@ -827,7 +830,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     1, 1, 0, 0, 0, 1, 1, 1,                      \
>     /* vrsave vscr sfp */                        \
> -   0, 0, 0                                      \
> +   0, 0, 0,                                     \
> +   /* Dense math registers.  */                         \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \

A 0 here means non-volatile, right? ....

>  }
>  
>  #define TOTAL_ALTIVEC_REGS   (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1)
> @@ -864,6 +869,7 @@ enum data_align { align_abi, align_opt, align_both };
>       v2              (not saved; incoming vector arg reg; return value)
>       v19 - v14       (not saved or used for anything)
>       v31 - v20       (saved; order given to save least number)
> +     dmr0 - dmr7     (not saved)

.... in which case it should be 'saved' here and not 'not saved'.

Also, all the other entries are from a higher register number to a lower
register number. So this should be dmr7 - dmr0.


>       vrsave, vscr    (fixed)
>       sfp             (fixed)
>  */
> @@ -906,6 +912,9 @@ enum data_align { align_abi, align_opt, align_both };
>     66,                                                               \
>     83, 82, 81, 80, 79, 78,                                   \
>     95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84,           \
> +   /* Dense math registers.  */                                      \
> +   111, 112, 113, 114, 115, 116, 117, 118,                   \

Here too, I believe the registers should be in decreasing order.
And should the entry for dense math registers occur below the entry
for register 110?

> +   /* Vrsave, vscr, sfp.  */                                 \
>     108, 109,                                                 \
>     110                                                               \
>  }
> @@ -932,6 +941,9 @@ enum data_align { align_abi, align_opt, align_both };
>  /* True if register is a VSX register.  */
>  #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N))
>  
> +/* True if register is a Dense math register.  */
> +#define DM_REGNO_P(N)        ((N) >= FIRST_DM_REGNO && (N) <= LAST_DM_REGNO)
> +
>  /* Alternate name for any vector register supporting floating point, no 
> matter
>     which instruction set(s) are available.  */
>  #define VFLOAT_REGNO_P(N) \
> @@ -1069,6 +1081,7 @@ enum reg_class
>    FLOAT_REGS,
>    ALTIVEC_REGS,
>    VSX_REGS,
> +  DM_REGS,

s/DM_REGS/DMF_REGS

>    VRSAVE_REGS,
>    VSCR_REGS,
>    GEN_OR_FLOAT_REGS,
> @@ -1098,6 +1111,7 @@ enum reg_class
>    "FLOAT_REGS",                                                              
> \
>    "ALTIVEC_REGS",                                                    \
>    "VSX_REGS",                                                                
> \
> +  "DM_REGS",                                                         \

Ditto

>    "VRSAVE_REGS",                                                     \
>    "VSCR_REGS",                                                               
> \
>    "GEN_OR_FLOAT_REGS",                                                       
> \
> @@ -1132,6 +1146,8 @@ enum reg_class
>    { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 },                        
> \
>    /* VSX_REGS.  */                                                   \
>    { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 },                        
> \
> +  /* DM_REGS.  */                                                    \

Ditto.

-Surya

> +  { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 },                        
> \
>    /* VRSAVE_REGS.  */                                                        
> \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00001000 },                        
> \
>    /* VSCR_REGS.  */                                                  \
> @@ -1159,7 +1175,7 @@ enum reg_class
>    /* CA_REGS.  */                                                    \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000004 },                        
> \
>    /* ALL_REGS.  */                                                   \
> -  { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff }                 \
> +  { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff }                 \
>  }
>  
>  /* The same information, inverted:
> @@ -2060,7 +2076,16 @@ extern char rs6000_reg_names[][8];     /* register 
> names (0 vs. %r0).  */
>    &rs6000_reg_names[108][0], /* vrsave  */                           \
>    &rs6000_reg_names[109][0], /* vscr  */                             \
>                                                                       \
> -  &rs6000_reg_names[110][0]  /* sfp  */                              \
> +  &rs6000_reg_names[110][0], /* sfp  */                              \
> +                                                                     \
> +  &rs6000_reg_names[111][0], /* dmr0  */                             \
> +  &rs6000_reg_names[112][0], /* dmr1  */                             \
> +  &rs6000_reg_names[113][0], /* dmr2  */                             \
> +  &rs6000_reg_names[114][0], /* dmr3  */                             \
> +  &rs6000_reg_names[115][0], /* dmr4  */                             \
> +  &rs6000_reg_names[116][0], /* dmr5  */                             \
> +  &rs6000_reg_names[117][0], /* dmr6  */                             \
> +  &rs6000_reg_names[118][0], /* dmr7  */                             \
>  }
>  
>  /* Table of additional register names to use in user input.  */
> @@ -2114,6 +2139,8 @@ extern char rs6000_reg_names[][8];      /* register 
> names (0 vs. %r0).  */
>    {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87},    \
>    {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91},    \
>    {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95},    \
> +  {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114},        \
> +  {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118},        \
>  }
>  
>  /* This is how to output an element of a case-vector that is relative.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 3089551552c..57a239791ee 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -51,6 +51,8 @@ (define_constants
>     (VRSAVE_REGNO             108)
>     (VSCR_REGNO                       109)
>     (FRAME_POINTER_REGNUM     110)
> +   (FIRST_DM_REGNO           111)
> +   (LAST_DM_REGNO            118)
>    ])
>  
>  ;;
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9f3519da77b..436309bb09c 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,6 +639,10 @@ mieee128-constant
>  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the LXVKQ instruction.
>  
> +mdense_math
> +Target Mask(DENSE_MATH) Var(rs6000_isa_flags)
> +Generate (do not generate) instructions that use dense math registers.
> +
>  ; Documented parameters
>  
>  -param=rs6000-vect-unroll-limit=
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 076113bc91a..a58c5e188e1 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -34384,6 +34384,13 @@ This option is enabled by default.
>  Enable or disable warnings about deprecated @samp{vector long ...} Altivec
>  type usage.  This option is enabled by default.
>  
> +@opindex mdense-math
> +@opindex mno-dense-math
> +@item -mdense-math
> +@itemx -mno-dense-math
> +Generate (do not generate) code that uses the dense math registers.
> +This option is enabled by default.
> +
>  @item --param rs6000-vect-unroll-limit=
>  The vectorizer checks with target information to determine whether it
>  would be beneficial to unroll the main vectorized loop and by how much.  This

Re: [PATCH V3, 2/4] Add support for dense math registers on a future PowerPC

Reply via email to