Re: [PATCH V4, 3/5] Add support for dense math registers.

Surya Kumari Jangala Fri, 06 Mar 2026 02:55:13 -0800

Hi Mike,
I had raised a few comments in v3 of the patch, but they have not been 
addressed in
v4. I have posted them again:


On 21/02/26 12:44 pm, Michael Meissner wrote:
> This patch adds basic support for dense math registers.  It includes support 
> for
> moving values to/from dense registers.  The MMA instructions are not yet
> modified to know about dense math registers.  The -mcpu=future option does not
> set -mdense-math in this patch.  A future patch will make these changes.
> 
> The changes include:
> 
>    1: XOmode moves include moving to/from dense math registers.
> 
>    2: Add predicate dense_math_operand.
> 
>    3: Make the predicate accumulator_operand match on dense math registers.
> 
>    4: Add dense math register class.
> 
>    5: Add the 8 dense math register accumulators with internal register
>       numbers 111-118.
> 
>    6: Make the 'wD' constraint match dense math register if -mdense-math, and
>       4 adjacent VSX register if -mno-dense-math is in effect.
> 
>    7: Set up the reload information so that the register allocator knows that
>       dense math registers do not have load or store instructions.  Instead to
>       read/write dense math registers, you have to use VSX registers as
>       intermediaries.
> 
>    8: Make the print_operand '%A' output operand now knows about accumulators
>       in dense math registrs and accumulators in 4 adjacent VSX registers.
> 
>    9: Update register move and memmory load/store costs for dense math
>       registers.
> 
>    10:        Make dense math registers a pressure class for register 
> allocation.
> 
>    11:        Do not issue MMA deprime instructions if -mdense-math is in 
> effect.
> 
>    12:        Add support for dense math registers to 
> rs6000_split_multireg_move.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> This is version 4 of the patches.  The previous patches were:
> 
>  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707452.html
>  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707453.html
>  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707454.html
>  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707455.html
>  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707456.html
> 
> gcc/
> 
> 2026-02-21   Michael Meissner  <[email protected]>
> 
>       * config/rs6000/mma.md (movxo): Convert to being a define_expand that
>       can handle both the original MMA support without dense math registes,
>       and adding dense math support.
>       (movxo_nodm): Rename original movxo insn, and restrict this insn to when
>       we do not have dense math registers.
>       (movxo_dm): New define_insn_and_split for dense math registers.
>       * config/rs6000/predicates.md (dense_math_operand): New predicate.
>       (accumulator_operand): Add support for dense math registes.
>       * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add dense math
>       register support.
>       (enum rs6000_reload_reg_typ): Likewise.
>       (LAST_RELOAD_REG_CLASS): Likewise.
>       (reload_reg_map): Likewise.
>       (rs6000_reg_names): Likewise.
>       (alt_reg_names): Likewise.
>       (rs6000_hard_regno_nregs_internal): Likewise.
>       (rs6000_hard_regno_mode_ok_uncached): Likewise.
>       (rs6000_debug_reg_global): Likewise.
>       (rs6000_setup_reg_addr_masks): Likewise.
>       (rs6000_init_hard_regno_mode_ok): Likewise.
>       (rs6000_option_override_internal): Likewise.

I don't see an entry for this change in this patch.

>       (rs6000_secondary_reload_memory): Likewise.
>       (rs6000_secondary_reload_simple_move): Likewise.
>       (rs6000_preferred_reload_class): Likewise.
>       (rs6000_secondary_reload_class): Likewise.
>       (print_operand): Likewise.
>       (rs6000_dense_math_register_move_cost): New helper function.
>       (rs6000_register_move_cost): Add dense math register support.
>       (rs6000_memory_move_cost): Likewise.
>       (rs6000_compute_pressure_classes): Likewise.
>       (rs6000_debugger_regno): Likewise.
>       (rs6000_opt_masks): Likewise.
>       (rs6000_split_multireg_move): Likewise.
>       * config/rs6000/rs6000.h (UNITS_PER_DM_WORD): New macro.
>       (FIRST_PSEUDO_REGISTER): Add dense math register support.
>       (FIXED_REGISTERS): Likewise.
>       (CALL_REALLY_USED_REGISTERS): Likewise.
>       (REG_ALLOC_ORDER): Likewise.
>       (DM_REGNO_P): New macro.
>       (enum reg_class): Add dense math register support.
>       (REG_CLASS_NAMES): Likewise.
>       (REGISTER_NAMES): Likewise.
>       (ADDITIONAL_REGISTER_NAMES): Likewise.
>       * config/rs6000/rs6000.md (FIRST_DM_REGNO): New constant.
>       (LAST_DM_REGNO): Likewise.
> ---
>  gcc/config/rs6000/mma.md        |  37 +++++-
>  gcc/config/rs6000/predicates.md |  26 +++-
>  gcc/config/rs6000/rs6000.cc     | 213 ++++++++++++++++++++++++++------
>  gcc/config/rs6000/rs6000.h      |  37 +++++-
>  gcc/config/rs6000/rs6000.md     |   2 +
>  5 files changed, 263 insertions(+), 52 deletions(-)
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 77e7c633730..1813adbecd3 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -313,7 +313,7 @@ (define_insn_and_split "*movoo"
>     (set_attr "length" "*,*,8")])
>  
>  
> -;; Vector quad support.  XOmode can only live in FPRs.
> +;; Vector quad support.
>  (define_expand "movxo"
>    [(set (match_operand:XO 0 "nonimmediate_operand")
>       (match_operand:XO 1 "input_operand"))]
> @@ -338,10 +338,13 @@ (define_expand "movxo"
>      gcc_assert (false);
>  })
>  
> -(define_insn_and_split "*movxo"
> +;; If we do not have dense math registers, XOmode can only live in FPR
> +;; registers (0..31).
> +
> +(define_insn_and_split "*movxo_nodm"
>    [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d")
>       (match_operand:XO 1 "input_operand" "ZwO,d,d"))]
> -  "TARGET_MMA
> +  "TARGET_MMA && !TARGET_DENSE_MATH
>     && (gpc_reg_operand (operands[0], XOmode)
>         || gpc_reg_operand (operands[1], XOmode))"
>    "@
> @@ -358,6 +361,34 @@ (define_insn_and_split "*movxo"
>     (set_attr "length" "*,*,16")
>     (set_attr "max_prefixed_insns" "2,2,*")])
>  
> +;; If dense math registers are available, XOmode can live in either VSX
> +;; registers (0..63) or dense math registers.
> +
> +(define_insn_and_split "*movxo_dm"
> +  [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa")
> +     (match_operand:XO 1 "input_operand"        "ZwO,wa, wa,wa,wD,wD"))]
> +  "TARGET_DENSE_MATH
> +   && (gpc_reg_operand (operands[0], XOmode)
> +       || gpc_reg_operand (operands[1], XOmode))"
> +  "@
> +   #
> +   #
> +   #
> +   dmxxinstdmr512 %0,%1,%Y1,0
> +   dmmr %0,%1
> +   dmxxextfdmr512 %0,%Y0,%1,0"
> +  "&& reload_completed
> +   && !dense_math_operand (operands[0], XOmode)
> +   && !dense_math_operand (operands[1], XOmode)"
> +  [(const_int 0)]
> +{
> +  rs6000_split_multireg_move (operands[0], operands[1]);
> +  DONE;
> +}
> +  [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma")
> +   (set_attr "length" "*,*,16,*,*,*")
> +   (set_attr "max_prefixed_insns" "2,2,*,*,*,*")])
> +
>  (define_expand "vsx_assemble_pair"
>    [(match_operand:OO 0 "vsx_register_operand")
>     (match_operand:V16QI 1 "mma_assemble_input_operand")
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 682fd2dc6e8..5de81d54507 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -186,8 +186,26 @@ (define_predicate "vlogical_operand"
>    return VLOGICAL_REGNO_P (REGNO (op));
>  })
>  
> -;; Return 1 if op is an accumulator.  On power10 systems, the accumulators
> -;; overlap with the FPRs.
> +;; Return 1 if op is a dense math register
> +(define_predicate "dense_math_operand"
> +  (match_operand 0 "register_operand")
> +{
> +  if (!REG_P (op))
> +    return 0;
> +
> +  if (!HARD_REGISTER_P (op))
> +    return 1;
> +
> +  return DM_REGNO_P (REGNO (op));
> +})
> +
> +;; Return 1 if op is an accumulator.
> +;;
> +;; On power10 and power11 systems, the accumulators overlap with the
> +;; FPRs and the register must be divisible by 4.
> +;;
> +;; On systems with dense math registers, the accumulators are separate
> +;; registers and do not overlap with the FPR registers.
>  (define_predicate "accumulator_operand"
>    (match_operand 0 "register_operand")
>  {
> @@ -201,7 +219,9 @@ (define_predicate "accumulator_operand"
>      return 1;
>  
>    int r = REGNO (op);
> -  return FP_REGNO_P (r) && (r & 3) == 0;
> +  return (TARGET_DENSE_MATH
> +       ? DM_REGNO_P (r)
> +       : FP_REGNO_P (r) && (r & 3) == 0);
>  })
>  
>  ;; Return 1 if op is the carry register.
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 68d5e95179f..2587c00301f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -292,7 +292,8 @@ enum rs6000_reg_type {
>    ALTIVEC_REG_TYPE,
>    FPR_REG_TYPE,
>    SPR_REG_TYPE,
> -  CR_REG_TYPE
> +  CR_REG_TYPE,
> +  DM_REG_TYPE
>  };
>  
>  /* Map register class to register type.  */
> @@ -306,22 +307,24 @@ static enum rs6000_reg_type 
> reg_class_to_reg_type[N_REG_CLASSES];
>  
>  
>  /* Register classes we care about in secondary reload or go if legitimate
> -   address.  We only need to worry about GPR, FPR, and Altivec registers 
> here,
> -   along an ANY field that is the OR of the 3 register classes.  */
> +   address.  We only need to worry about GPR, FPR, Altivec, and dense math
> +   registers here, along an ANY field that is the OR of the 4 register
> +   classes.  */
>  
>  enum rs6000_reload_reg_type {
>    RELOAD_REG_GPR,                    /* General purpose registers.  */
>    RELOAD_REG_FPR,                    /* Traditional floating point regs.  */
>    RELOAD_REG_VMX,                    /* Altivec (VMX) registers.  */
> -  RELOAD_REG_ANY,                    /* OR of GPR, FPR, Altivec masks.  */
> +  RELOAD_REG_DMR,                    /* Dense math registers.  */
> +  RELOAD_REG_ANY,                    /* OR of GPR/FPR/VMX/DMR masks.  */
>    N_RELOAD_REG
>  };
>  
> -/* For setting up register classes, loop through the 3 register classes 
> mapping
> +/* For setting up register classes, loop through the 4 register classes 
> mapping
>     into real registers, and skip the ANY class, which is just an OR of the
>     bits.  */
>  #define FIRST_RELOAD_REG_CLASS       RELOAD_REG_GPR
> -#define LAST_RELOAD_REG_CLASS        RELOAD_REG_VMX
> +#define LAST_RELOAD_REG_CLASS        RELOAD_REG_DMR
>  
>  /* Map reload register type to a register in the register class.  */
>  struct reload_reg_map_type {
> @@ -333,6 +336,7 @@ static const struct reload_reg_map_type 
> reload_reg_map[N_RELOAD_REG] = {
>    { "Gpr",   FIRST_GPR_REGNO },      /* RELOAD_REG_GPR.  */
>    { "Fpr",   FIRST_FPR_REGNO },      /* RELOAD_REG_FPR.  */
>    { "VMX",   FIRST_ALTIVEC_REGNO },  /* RELOAD_REG_VMX.  */
> +  { "Dmr",   FIRST_DM_REGNO },       /* RELOAD_REG_DMR.  */
>    { "Any",   -1 },                   /* RELOAD_REG_ANY.  */
>  };
>  
> @@ -1226,6 +1230,8 @@ char rs6000_reg_names[][8] =
>        "0",  "1",  "2",  "3",  "4",  "5",  "6",  "7",
>    /* vrsave vscr sfp */
>        "vrsave", "vscr", "sfp",
> +  /* dense math registers.  */
> +      "0", "1", "2", "3", "4", "5", "6", "7",
>  };
>  
>  #ifdef TARGET_REGNAMES
> @@ -1252,6 +1258,8 @@ static const char alt_reg_names[][8] =
>    "%cr0",  "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7",
>    /* vrsave vscr sfp */
>    "vrsave", "vscr", "sfp",
> +  /* dense math registers.  */
> +  "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7",
>  };
>  #endif
>  
> @@ -1842,6 +1850,9 @@ rs6000_hard_regno_nregs_internal (int regno, 
> machine_mode mode)
>    else if (ALTIVEC_REGNO_P (regno))
>      reg_size = UNITS_PER_ALTIVEC_WORD;
>  
> +  else if (DM_REGNO_P (regno))
> +    reg_size = UNITS_PER_DM_WORD;
> +
>    else
>      reg_size = UNITS_PER_WORD;
>  
> @@ -1863,9 +1874,32 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
> machine_mode mode)
>    if (mode == OOmode)
>      return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
>  
> -  /* MMA accumulator modes need FPR registers divisible by 4.  */
> +  /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible
> +     by 4.
> +
> +     If dense math registers are enabled, we can allow all VSX registers plus
> +     the dense math registers.  VSX registers are used to load and store the
> +     registers as the accumulator registers do not have load and store
> +     instructions.  Because we just use the VSX registers for load/store
> +     operations, we just need to make sure load vector pair and store vector
> +     pair instructions can be used.  */
>    if (mode == XOmode)
> -    return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0);
> +    {
> +      if (!TARGET_DENSE_MATH)

Here, we are assuming that !TARGET_DENSE_MATH means -mcpu is power10.
However, for -mcpu=future, we can turn off the dense math facility. 
Any use of MMA instructions in this case will throw an error.

> +     return (FP_REGNO_P (regno) && (regno & 3) == 0);
> +
> +      else if (DM_REGNO_P (regno))
> +     return 1;
> +
> +      else
> +     return (VSX_REGNO_P (regno)
> +             && VSX_REGNO_P (last_regno)
> +             && (regno & 1) == 0);
> +    }
> +
> +  /* No other types other than XOmode can go in dense math registers.  */
> +  if (DM_REGNO_P (regno))
> +    return 0;
>  
>    /* PTImode can only go in GPRs.  Quad word memory operations require 
> even/odd
>       register combinations, and use PTImode where we need to deal with quad
> @@ -2308,6 +2342,7 @@ rs6000_debug_reg_global (void)
>    rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO,
>                         LAST_ALTIVEC_REGNO,
>                         "vs");
> +  rs6000_debug_reg_print (FIRST_DM_REGNO, LAST_DM_REGNO, "dense_math");
>    rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr");
>    rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr");
>    rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr");
> @@ -2634,6 +2669,21 @@ rs6000_setup_reg_addr_masks (void)
>         addr_mask = 0;
>         reg = reload_reg_map[rc].reg;
>  
> +       /* Special case dense math registers.  */
> +       if (rc == RELOAD_REG_DMR)
> +         {
> +           if (TARGET_DENSE_MATH && m2 == XOmode)
> +             {
> +               addr_mask = RELOAD_REG_VALID;
> +               reg_addr[m].addr_mask[rc] = addr_mask;
> +               any_addr_mask |= addr_mask;
> +             }
> +           else
> +             reg_addr[m].addr_mask[rc] = 0;
> +
> +           continue;
> +         }
> +
>         /* Can mode values go in the GPR/FPR/Altivec registers?  */
>         if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
>           {
> @@ -2784,6 +2834,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    for (r = CR1_REGNO; r <= CR7_REGNO; ++r)
>      rs6000_regno_regclass[r] = CR_REGS;
>  
> +  for (r = FIRST_DM_REGNO; r <= LAST_DM_REGNO; ++r)
> +    rs6000_regno_regclass[r] = DM_REGS;
> +
>    rs6000_regno_regclass[LR_REGNO] = LINK_REGS;
>    rs6000_regno_regclass[CTR_REGNO] = CTR_REGS;
>    rs6000_regno_regclass[CA_REGNO] = NO_REGS;
> @@ -2808,6 +2861,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE;
>    reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE;
> +  reg_class_to_reg_type[(int)DM_REGS] = DM_REG_TYPE;
>  
>    if (TARGET_VSX)
>      {
> @@ -2994,8 +3048,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
>    if (TARGET_DIRECT_MOVE_128)
>      rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
>  
> +  /* Support for the accumulator registers, either FPR registers (aka 
> original
> +     mma) or dense math registers.  */
>    if (TARGET_MMA)
> -    rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
> +    rs6000_constraints[RS6000_CONSTRAINT_wD]
> +      = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS;
>  
>    /* Set up the reload helper and direct move functions.  */
>    if (TARGET_VSX || TARGET_ALTIVEC)
> @@ -12365,6 +12422,11 @@ rs6000_secondary_reload_memory (rtx addr,
>      addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX]
>                & ~RELOAD_REG_AND_M16);
>  
> +  /* Dense math registers use VSX registers for memory operations, and need 
> to
> +     generate some extra instructions.  */
> +  else if (rclass == DM_REGS)
> +    return 2;
> +
>    /* If the register allocator hasn't made up its mind yet on the register
>       class to use, settle on defaults to use.  */
>    else if (rclass == NO_REGS)
> @@ -12693,6 +12755,13 @@ rs6000_secondary_reload_simple_move (enum 
> rs6000_reg_type to_type,
>              || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
>      return true;
>  
> +  /* We can transfer between VSX registers and dense math registers without
> +     needing extra registers.  */
> +  if (TARGET_DENSE_MATH && mode == XOmode
> +      && ((to_type == DM_REG_TYPE && from_type == VSX_REG_TYPE)
> +       || (to_type == VSX_REG_TYPE && from_type == DM_REG_TYPE)))
> +    return true;
> +
>    return false;
>  }
>  
> @@ -13387,6 +13456,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>    machine_mode mode = GET_MODE (x);
>    bool is_constant = CONSTANT_P (x);
>  
> +  /* Dense math registers can't be loaded or stored.  */
> +  if (rclass == DM_REGS)
> +    return NO_REGS;
> +
>    /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a 
> preferred
>       reload class for it.  */
>    if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS)
> @@ -13483,7 +13556,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>       return VSX_REGS;
>  
>        if (mode == XOmode)
> -     return FLOAT_REGS;
> +     return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS;
>  
>        if (GET_MODE_CLASS (mode) == MODE_INT)
>       return GENERAL_REGS;
> @@ -13608,6 +13681,11 @@ rs6000_secondary_reload_class (enum reg_class 
> rclass, machine_mode mode,
>    else
>      regno = -1;
>  
> +  /* Dense math registers don't have loads or stores.  We have to go through
> +     the VSX registers to load XOmode (vector quad).  */
> +  if (TARGET_DENSE_MATH && rclass == DM_REGS)
> +    return VSX_REGS;
> +
>    /* If we have VSX register moves, prefer moving scalar values between
>       Altivec registers and GPR by going via an FPR (and then via memory)
>       instead of reloading the secondary memory address for Altivec moves.  */
> @@ -14139,8 +14217,14 @@ print_operand (FILE *file, rtx x, int code)
>        output_operand.  */
>  
>      case 'A':
> -      /* Write the MMA accumulator number associated with VSX register X.  */
> -      if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
> +      /* Write the MMA accumulator number associated with VSX register X.  On
> +      dense math systems, only allow dense math accumulators, not
> +      accumulators overlapping with the FPR registers.  */
> +      if (!REG_P (x))
> +     output_operand_lossage ("invalid %%A value");
> +      else if (TARGET_DENSE_MATH && DM_REGNO_P (REGNO (x)))
> +     fprintf (file, "%d", REGNO (x) - FIRST_DM_REGNO);
> +      else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
>       output_operand_lossage ("invalid %%A value");
>        else
>       fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4);
> @@ -22760,6 +22844,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode,
>  }
>  
>  
> +/* Subroutine to determine the move cost of dense math registers.  If we are
> +   moving to/from VSX_REGISTER registers, the cost is either 1 move (for
> +   512-bit accumulators) or 2 moves (for 1,024 dense math registers).  If we 
> are
> +   moving to anything else like GPR registers, make the cost very high.  */
> +
> +static int
> +rs6000_dense_math_register_move_cost (machine_mode mode, reg_class_t rclass)
> +{
> +  const int reg_move_base = 2;
> +  HARD_REG_SET vsx_set = (reg_class_contents[rclass]
> +                       & reg_class_contents[VSX_REGS]);
> +
> +  if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set))
> +    {
> +      /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction.  */
> +      if (mode == XOmode)
> +     return reg_move_base;
> +
> +      else
> +     return reg_move_base * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +    }
> +
> +  return 1000 * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +}
> +
>  /* A C expression returning the cost of moving data from a register of class
>     CLASS1 to one of CLASS2.  */
>  
> @@ -22773,17 +22882,28 @@ rs6000_register_move_cost (machine_mode mode,
>    if (TARGET_DEBUG_COST)
>      dbg_cost_ctrl++;
>  
> +  HARD_REG_SET to_vsx, from_vsx;
> +  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> +  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> +
> +  /* Special case dense math registers, that can only move to/from VSX 
> registers.  */
> +  if (from == DM_REGS && to == DM_REGS)
> +    ret = 2 * hard_regno_nregs (FIRST_DM_REGNO, mode);
> +
> +  else if (from == DM_REGS)
> +    ret = rs6000_dense_math_register_move_cost (mode, to);
> +
> +  else if (to == DM_REGS)
> +    ret = rs6000_dense_math_register_move_cost (mode, from);
> +
>    /* If we have VSX, we can easily move between FPR or Altivec registers,
>       otherwise we can only easily move within classes.
>       Do this first so we give best-case answers for union classes
>       containing both gprs and vsx regs.  */
> -  HARD_REG_SET to_vsx, from_vsx;
> -  to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS];
> -  from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS];
> -  if (!hard_reg_set_empty_p (to_vsx)
> -      && !hard_reg_set_empty_p (from_vsx)
> -      && (TARGET_VSX
> -       || hard_reg_set_intersect_p (to_vsx, from_vsx)))
> +  else if (!hard_reg_set_empty_p (to_vsx)
> +        && !hard_reg_set_empty_p (from_vsx)
> +        && (TARGET_VSX
> +            || hard_reg_set_intersect_p (to_vsx, from_vsx)))
>      {
>        int reg = FIRST_FPR_REGNO;
>        if (TARGET_VSX
> @@ -22879,6 +22999,9 @@ rs6000_memory_move_cost (machine_mode mode, 
> reg_class_t rclass,
>      ret = 4 * hard_regno_nregs (32, mode);
>    else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
>      ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode);
> +  else if (reg_classes_intersect_p (rclass, DM_REGS))
> +    ret = (rs6000_dense_math_register_move_cost (mode, VSX_REGS)
> +        + rs6000_memory_move_cost (mode, VSX_REGS, false));
>    else
>      ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS);
>  
> @@ -24087,6 +24210,8 @@ rs6000_compute_pressure_classes (enum reg_class 
> *pressure_classes)
>        if (TARGET_HARD_FLOAT)
>       pressure_classes[n++] = FLOAT_REGS;
>      }
> +  if (TARGET_DENSE_MATH)
> +    pressure_classes[n++] = DM_REGS;
>    pressure_classes[n++] = CR_REGS;
>    pressure_classes[n++] = SPECIAL_REGS;
>  
> @@ -24251,6 +24376,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned 
> int format)
>      return 67;
>    if (regno == 64)
>      return 64;
> +  /* XXX: This is a guess.  The GCC register number for FIRST_DM_REGNO is 
> 111,
> +     but the frame pointer regnum uses that.  */
> +  if (DM_REGNO_P (regno))
> +    return regno - FIRST_DM_REGNO + 112;
>  
>    gcc_unreachable ();
>  }
> @@ -27490,9 +27619,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>         unsigned offset = 0;
>         unsigned size = GET_MODE_SIZE (reg_mode);
>  
> -       /* If we are reading an accumulator register, we have to
> -          deprime it before we can access it.  */
> -       if (TARGET_MMA
> +       /* If we are reading an accumulator register, we have to deprime it
> +          before we can access it unless we have dense math registers.  */
> +       if (TARGET_MMA && !TARGET_DENSE_MATH
>             && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>           emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27524,9 +27653,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst2, src2));
>           }
>  
> -       /* If we are writing an accumulator register, we have to
> -          prime it after we've written it.  */
> -       if (TARGET_MMA
> +       /* If we are writing an accumulator register, we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (TARGET_MMA && !TARGET_DENSE_MATH
>             && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> @@ -27540,7 +27669,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                     || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE);
>         gcc_assert (REG_P (dst));
>         if (GET_MODE (src) == XOmode)
> -         gcc_assert (FP_REGNO_P (REGNO (dst)));
> +         gcc_assert ((TARGET_DENSE_MATH
> +                      ? VSX_REGNO_P (REGNO (dst))
> +                      : FP_REGNO_P (REGNO (dst))));
>         if (GET_MODE (src) == OOmode)
>           gcc_assert (VSX_REGNO_P (REGNO (dst)));
>  
> @@ -27593,9 +27724,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>             emit_insn (gen_rtx_SET (dst_i, op));
>           }
>  
> -       /* We are writing an accumulator register, so we have to
> -          prime it after we've written it.  */
> -       if (GET_MODE (src) == XOmode)
> +       /* We are writing an accumulator register, so we have to prime it
> +          after we've written it unless we have dense math registers.  */
> +       if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
>           emit_insn (gen_mma_xxmtacc (dst, dst));
>  
>         return;
> @@ -27606,9 +27737,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>    if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
>      {
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27634,9 +27765,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        i * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>      }
> @@ -27771,9 +27902,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>           gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true));
>       }
>  
> -      /* If we are reading an accumulator register, we have to
> -      deprime it before we can access it.  */
> -      if (TARGET_MMA && REG_P (src)
> +      /* If we are reading an accumulator register, we have to deprime it
> +      before we can access it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src)
>         && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>       emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27803,9 +27934,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>                                                        j * reg_mode_size)));
>       }
>  
> -      /* If we are writing an accumulator register, we have to
> -      prime it after we've written it.  */
> -      if (TARGET_MMA && REG_P (dst)
> +      /* If we are writing an accumulator register, we have to prime it after
> +      we've written it unless we have dense math registers.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst)
>         && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>       emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 04709f0dcd6..5214a7c22ce 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -653,6 +653,7 @@ extern unsigned char rs6000_recip_bits[];
>  #define UNITS_PER_FP_WORD 8
>  #define UNITS_PER_ALTIVEC_WORD 16
>  #define UNITS_PER_VSX_WORD 16
> +#define UNITS_PER_DM_WORD 128
>  
>  /* Type used for ptrdiff_t, as a string used in a declaration.  */
>  #define PTRDIFF_TYPE "int"
> @@ -766,7 +767,7 @@ enum data_align { align_abi, align_opt, align_both };
>     Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame
>     pointer, which is eventually eliminated in favor of SP or FP.  */
>  
> -#define FIRST_PSEUDO_REGISTER 111
> +#define FIRST_PSEUDO_REGISTER 119
>  
>  /* Use standard DWARF numbering for DWARF debugging information.  */
>  #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0)
> @@ -803,7 +804,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     0, 0, 0, 0, 0, 0, 0, 0,                      \
>     /* vrsave vscr sfp */                        \
> -   1, 1, 1                                      \
> +   1, 1, 1,                                     \
> +   /* Dense math registers.  */                         \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \
>  }
>  
>  /* Like `CALL_USED_REGISTERS' except this macro doesn't require that
> @@ -827,7 +830,9 @@ enum data_align { align_abi, align_opt, align_both };
>     /* cr0..cr7 */                               \
>     1, 1, 0, 0, 0, 1, 1, 1,                      \
>     /* vrsave vscr sfp */                        \
> -   0, 0, 0                                      \
> +   0, 0, 0,                                     \
> +   /* Dense math registers.  */                         \
> +   0, 0, 0, 0, 0, 0, 0, 0                       \
>  }
>  
>  #define TOTAL_ALTIVEC_REGS   (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1)
> @@ -864,6 +869,7 @@ enum data_align { align_abi, align_opt, align_both };
>       v2              (not saved; incoming vector arg reg; return value)
>       v19 - v14       (not saved or used for anything)
>       v31 - v20       (saved; order given to save least number)
> +     dmr0 - dmr7     (not saved)

Shouldn't this be 'saved' here?

Also, all the other entries are from a higher register number to a lower
register number. So this should be dmr7 - dmr0.

>       vrsave, vscr    (fixed)
>       sfp             (fixed)
>  */
> @@ -906,6 +912,9 @@ enum data_align { align_abi, align_opt, align_both };
>     66,                                                               \
>     83, 82, 81, 80, 79, 78,                                   \
>     95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84,           \
> +   /* Dense math registers.  */                                      \
> +   111, 112, 113, 114, 115, 116, 117, 118,                   \

Here too, I believe the registers should be in decreasing order.
And should the entry for dense math registers occur below the entry
for register 110?

-Surya

> +   /* Vrsave, vscr, sfp.  */                                 \
>     108, 109,                                                 \
>     110                                                               \
>  }
> @@ -932,6 +941,9 @@ enum data_align { align_abi, align_opt, align_both };
>  /* True if register is a VSX register.  */
>  #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N))
>  
> +/* True if register is a Dense math register.  */
> +#define DM_REGNO_P(N)        ((N) >= FIRST_DM_REGNO && (N) <= LAST_DM_REGNO)
> +
>  /* Alternate name for any vector register supporting floating point, no 
> matter
>     which instruction set(s) are available.  */
>  #define VFLOAT_REGNO_P(N) \
> @@ -1069,6 +1081,7 @@ enum reg_class
>    FLOAT_REGS,
>    ALTIVEC_REGS,
>    VSX_REGS,
> +  DM_REGS,
>    VRSAVE_REGS,
>    VSCR_REGS,
>    GEN_OR_FLOAT_REGS,
> @@ -1098,6 +1111,7 @@ enum reg_class
>    "FLOAT_REGS",                                                              
> \
>    "ALTIVEC_REGS",                                                    \
>    "VSX_REGS",                                                                
> \
> +  "DM_REGS",                                                         \
>    "VRSAVE_REGS",                                                     \
>    "VSCR_REGS",                                                               
> \
>    "GEN_OR_FLOAT_REGS",                                                       
> \
> @@ -1132,6 +1146,8 @@ enum reg_class
>    { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 },                        
> \
>    /* VSX_REGS.  */                                                   \
>    { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 },                        
> \
> +  /* DM_REGS.  */                                                    \
> +  { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 },                        
> \
>    /* VRSAVE_REGS.  */                                                        
> \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00001000 },                        
> \
>    /* VSCR_REGS.  */                                                  \
> @@ -1159,7 +1175,7 @@ enum reg_class
>    /* CA_REGS.  */                                                    \
>    { 0x00000000, 0x00000000, 0x00000000, 0x00000004 },                        
> \
>    /* ALL_REGS.  */                                                   \
> -  { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff }                 \
> +  { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff }                 \
>  }
>  
>  /* The same information, inverted:
> @@ -2060,7 +2076,16 @@ extern char rs6000_reg_names[][8];     /* register 
> names (0 vs. %r0).  */
>    &rs6000_reg_names[108][0], /* vrsave  */                           \
>    &rs6000_reg_names[109][0], /* vscr  */                             \
>                                                                       \
> -  &rs6000_reg_names[110][0]  /* sfp  */                              \
> +  &rs6000_reg_names[110][0], /* sfp  */                              \
> +                                                                     \
> +  &rs6000_reg_names[111][0], /* dmr0  */                             \
> +  &rs6000_reg_names[112][0], /* dmr1  */                             \
> +  &rs6000_reg_names[113][0], /* dmr2  */                             \
> +  &rs6000_reg_names[114][0], /* dmr3  */                             \
> +  &rs6000_reg_names[115][0], /* dmr4  */                             \
> +  &rs6000_reg_names[116][0], /* dmr5  */                             \
> +  &rs6000_reg_names[117][0], /* dmr6  */                             \
> +  &rs6000_reg_names[118][0], /* dmr7  */                             \
>  }
>  
>  /* Table of additional register names to use in user input.  */
> @@ -2114,6 +2139,8 @@ extern char rs6000_reg_names[][8];      /* register 
> names (0 vs. %r0).  */
>    {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87},    \
>    {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91},    \
>    {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95},    \
> +  {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114},        \
> +  {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118},        \
>  }
>  
>  /* This is how to output an element of a case-vector that is relative.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 3089551552c..57a239791ee 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -51,6 +51,8 @@ (define_constants
>     (VRSAVE_REGNO             108)
>     (VSCR_REGNO                       109)
>     (FRAME_POINTER_REGNUM     110)
> +   (FIRST_DM_REGNO           111)
> +   (LAST_DM_REGNO            118)
>    ])
>  
>  ;;

Re: [PATCH V4, 3/5] Add support for dense math registers.

Reply via email to