Hi Mike, I had raised a few comments in v3 of the patch, but they have not been addressed in v4. I have posted them again:
On 21/02/26 12:44 pm, Michael Meissner wrote: > This patch adds basic support for dense math registers. It includes support > for > moving values to/from dense registers. The MMA instructions are not yet > modified to know about dense math registers. The -mcpu=future option does not > set -mdense-math in this patch. A future patch will make these changes. > > The changes include: > > 1: XOmode moves include moving to/from dense math registers. > > 2: Add predicate dense_math_operand. > > 3: Make the predicate accumulator_operand match on dense math registers. > > 4: Add dense math register class. > > 5: Add the 8 dense math register accumulators with internal register > numbers 111-118. > > 6: Make the 'wD' constraint match dense math register if -mdense-math, and > 4 adjacent VSX register if -mno-dense-math is in effect. > > 7: Set up the reload information so that the register allocator knows that > dense math registers do not have load or store instructions. Instead to > read/write dense math registers, you have to use VSX registers as > intermediaries. > > 8: Make the print_operand '%A' output operand now knows about accumulators > in dense math registrs and accumulators in 4 adjacent VSX registers. > > 9: Update register move and memmory load/store costs for dense math > registers. > > 10: Make dense math registers a pressure class for register > allocation. > > 11: Do not issue MMA deprime instructions if -mdense-math is in > effect. > > 12: Add support for dense math registers to > rs6000_split_multireg_move. > > The patches have been tested on both little and big endian systems. Can I > check > it into the master branch? > > This is version 4 of the patches. The previous patches were: > > * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707452.html > * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707453.html > * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707454.html > * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707455.html > * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707456.html > > gcc/ > > 2026-02-21 Michael Meissner <[email protected]> > > * config/rs6000/mma.md (movxo): Convert to being a define_expand that > can handle both the original MMA support without dense math registes, > and adding dense math support. > (movxo_nodm): Rename original movxo insn, and restrict this insn to when > we do not have dense math registers. > (movxo_dm): New define_insn_and_split for dense math registers. > * config/rs6000/predicates.md (dense_math_operand): New predicate. > (accumulator_operand): Add support for dense math registes. > * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add dense math > register support. > (enum rs6000_reload_reg_typ): Likewise. > (LAST_RELOAD_REG_CLASS): Likewise. > (reload_reg_map): Likewise. > (rs6000_reg_names): Likewise. > (alt_reg_names): Likewise. > (rs6000_hard_regno_nregs_internal): Likewise. > (rs6000_hard_regno_mode_ok_uncached): Likewise. > (rs6000_debug_reg_global): Likewise. > (rs6000_setup_reg_addr_masks): Likewise. > (rs6000_init_hard_regno_mode_ok): Likewise. > (rs6000_option_override_internal): Likewise. I don't see an entry for this change in this patch. > (rs6000_secondary_reload_memory): Likewise. > (rs6000_secondary_reload_simple_move): Likewise. > (rs6000_preferred_reload_class): Likewise. > (rs6000_secondary_reload_class): Likewise. > (print_operand): Likewise. > (rs6000_dense_math_register_move_cost): New helper function. > (rs6000_register_move_cost): Add dense math register support. > (rs6000_memory_move_cost): Likewise. > (rs6000_compute_pressure_classes): Likewise. > (rs6000_debugger_regno): Likewise. > (rs6000_opt_masks): Likewise. > (rs6000_split_multireg_move): Likewise. > * config/rs6000/rs6000.h (UNITS_PER_DM_WORD): New macro. > (FIRST_PSEUDO_REGISTER): Add dense math register support. > (FIXED_REGISTERS): Likewise. > (CALL_REALLY_USED_REGISTERS): Likewise. > (REG_ALLOC_ORDER): Likewise. > (DM_REGNO_P): New macro. > (enum reg_class): Add dense math register support. > (REG_CLASS_NAMES): Likewise. > (REGISTER_NAMES): Likewise. > (ADDITIONAL_REGISTER_NAMES): Likewise. > * config/rs6000/rs6000.md (FIRST_DM_REGNO): New constant. > (LAST_DM_REGNO): Likewise. > --- > gcc/config/rs6000/mma.md | 37 +++++- > gcc/config/rs6000/predicates.md | 26 +++- > gcc/config/rs6000/rs6000.cc | 213 ++++++++++++++++++++++++++------ > gcc/config/rs6000/rs6000.h | 37 +++++- > gcc/config/rs6000/rs6000.md | 2 + > 5 files changed, 263 insertions(+), 52 deletions(-) > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index 77e7c633730..1813adbecd3 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -313,7 +313,7 @@ (define_insn_and_split "*movoo" > (set_attr "length" "*,*,8")]) > > > -;; Vector quad support. XOmode can only live in FPRs. > +;; Vector quad support. > (define_expand "movxo" > [(set (match_operand:XO 0 "nonimmediate_operand") > (match_operand:XO 1 "input_operand"))] > @@ -338,10 +338,13 @@ (define_expand "movxo" > gcc_assert (false); > }) > > -(define_insn_and_split "*movxo" > +;; If we do not have dense math registers, XOmode can only live in FPR > +;; registers (0..31). > + > +(define_insn_and_split "*movxo_nodm" > [(set (match_operand:XO 0 "nonimmediate_operand" "=d,ZwO,d") > (match_operand:XO 1 "input_operand" "ZwO,d,d"))] > - "TARGET_MMA > + "TARGET_MMA && !TARGET_DENSE_MATH > && (gpc_reg_operand (operands[0], XOmode) > || gpc_reg_operand (operands[1], XOmode))" > "@ > @@ -358,6 +361,34 @@ (define_insn_and_split "*movxo" > (set_attr "length" "*,*,16") > (set_attr "max_prefixed_insns" "2,2,*")]) > > +;; If dense math registers are available, XOmode can live in either VSX > +;; registers (0..63) or dense math registers. > + > +(define_insn_and_split "*movxo_dm" > + [(set (match_operand:XO 0 "nonimmediate_operand" "=wa,ZwO,wa,wD,wD,wa") > + (match_operand:XO 1 "input_operand" "ZwO,wa, wa,wa,wD,wD"))] > + "TARGET_DENSE_MATH > + && (gpc_reg_operand (operands[0], XOmode) > + || gpc_reg_operand (operands[1], XOmode))" > + "@ > + # > + # > + # > + dmxxinstdmr512 %0,%1,%Y1,0 > + dmmr %0,%1 > + dmxxextfdmr512 %0,%Y0,%1,0" > + "&& reload_completed > + && !dense_math_operand (operands[0], XOmode) > + && !dense_math_operand (operands[1], XOmode)" > + [(const_int 0)] > +{ > + rs6000_split_multireg_move (operands[0], operands[1]); > + DONE; > +} > + [(set_attr "type" "vecload,vecstore,veclogical,mma,mma,mma") > + (set_attr "length" "*,*,16,*,*,*") > + (set_attr "max_prefixed_insns" "2,2,*,*,*,*")]) > + > (define_expand "vsx_assemble_pair" > [(match_operand:OO 0 "vsx_register_operand") > (match_operand:V16QI 1 "mma_assemble_input_operand") > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index 682fd2dc6e8..5de81d54507 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -186,8 +186,26 @@ (define_predicate "vlogical_operand" > return VLOGICAL_REGNO_P (REGNO (op)); > }) > > -;; Return 1 if op is an accumulator. On power10 systems, the accumulators > -;; overlap with the FPRs. > +;; Return 1 if op is a dense math register > +(define_predicate "dense_math_operand" > + (match_operand 0 "register_operand") > +{ > + if (!REG_P (op)) > + return 0; > + > + if (!HARD_REGISTER_P (op)) > + return 1; > + > + return DM_REGNO_P (REGNO (op)); > +}) > + > +;; Return 1 if op is an accumulator. > +;; > +;; On power10 and power11 systems, the accumulators overlap with the > +;; FPRs and the register must be divisible by 4. > +;; > +;; On systems with dense math registers, the accumulators are separate > +;; registers and do not overlap with the FPR registers. > (define_predicate "accumulator_operand" > (match_operand 0 "register_operand") > { > @@ -201,7 +219,9 @@ (define_predicate "accumulator_operand" > return 1; > > int r = REGNO (op); > - return FP_REGNO_P (r) && (r & 3) == 0; > + return (TARGET_DENSE_MATH > + ? DM_REGNO_P (r) > + : FP_REGNO_P (r) && (r & 3) == 0); > }) > > ;; Return 1 if op is the carry register. > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 68d5e95179f..2587c00301f 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -292,7 +292,8 @@ enum rs6000_reg_type { > ALTIVEC_REG_TYPE, > FPR_REG_TYPE, > SPR_REG_TYPE, > - CR_REG_TYPE > + CR_REG_TYPE, > + DM_REG_TYPE > }; > > /* Map register class to register type. */ > @@ -306,22 +307,24 @@ static enum rs6000_reg_type > reg_class_to_reg_type[N_REG_CLASSES]; > > > /* Register classes we care about in secondary reload or go if legitimate > - address. We only need to worry about GPR, FPR, and Altivec registers > here, > - along an ANY field that is the OR of the 3 register classes. */ > + address. We only need to worry about GPR, FPR, Altivec, and dense math > + registers here, along an ANY field that is the OR of the 4 register > + classes. */ > > enum rs6000_reload_reg_type { > RELOAD_REG_GPR, /* General purpose registers. */ > RELOAD_REG_FPR, /* Traditional floating point regs. */ > RELOAD_REG_VMX, /* Altivec (VMX) registers. */ > - RELOAD_REG_ANY, /* OR of GPR, FPR, Altivec masks. */ > + RELOAD_REG_DMR, /* Dense math registers. */ > + RELOAD_REG_ANY, /* OR of GPR/FPR/VMX/DMR masks. */ > N_RELOAD_REG > }; > > -/* For setting up register classes, loop through the 3 register classes > mapping > +/* For setting up register classes, loop through the 4 register classes > mapping > into real registers, and skip the ANY class, which is just an OR of the > bits. */ > #define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR > -#define LAST_RELOAD_REG_CLASS RELOAD_REG_VMX > +#define LAST_RELOAD_REG_CLASS RELOAD_REG_DMR > > /* Map reload register type to a register in the register class. */ > struct reload_reg_map_type { > @@ -333,6 +336,7 @@ static const struct reload_reg_map_type > reload_reg_map[N_RELOAD_REG] = { > { "Gpr", FIRST_GPR_REGNO }, /* RELOAD_REG_GPR. */ > { "Fpr", FIRST_FPR_REGNO }, /* RELOAD_REG_FPR. */ > { "VMX", FIRST_ALTIVEC_REGNO }, /* RELOAD_REG_VMX. */ > + { "Dmr", FIRST_DM_REGNO }, /* RELOAD_REG_DMR. */ > { "Any", -1 }, /* RELOAD_REG_ANY. */ > }; > > @@ -1226,6 +1230,8 @@ char rs6000_reg_names[][8] = > "0", "1", "2", "3", "4", "5", "6", "7", > /* vrsave vscr sfp */ > "vrsave", "vscr", "sfp", > + /* dense math registers. */ > + "0", "1", "2", "3", "4", "5", "6", "7", > }; > > #ifdef TARGET_REGNAMES > @@ -1252,6 +1258,8 @@ static const char alt_reg_names[][8] = > "%cr0", "%cr1", "%cr2", "%cr3", "%cr4", "%cr5", "%cr6", "%cr7", > /* vrsave vscr sfp */ > "vrsave", "vscr", "sfp", > + /* dense math registers. */ > + "%dmr0", "%dmr1", "%dmr2", "%dmr3", "%dmr4", "%dmr5", "%dmr6", "%dmr7", > }; > #endif > > @@ -1842,6 +1850,9 @@ rs6000_hard_regno_nregs_internal (int regno, > machine_mode mode) > else if (ALTIVEC_REGNO_P (regno)) > reg_size = UNITS_PER_ALTIVEC_WORD; > > + else if (DM_REGNO_P (regno)) > + reg_size = UNITS_PER_DM_WORD; > + > else > reg_size = UNITS_PER_WORD; > > @@ -1863,9 +1874,32 @@ rs6000_hard_regno_mode_ok_uncached (int regno, > machine_mode mode) > if (mode == OOmode) > return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0); > > - /* MMA accumulator modes need FPR registers divisible by 4. */ > + /* On ISA 3.1 (power10), MMA accumulator modes need FPR registers divisible > + by 4. > + > + If dense math registers are enabled, we can allow all VSX registers plus > + the dense math registers. VSX registers are used to load and store the > + registers as the accumulator registers do not have load and store > + instructions. Because we just use the VSX registers for load/store > + operations, we just need to make sure load vector pair and store vector > + pair instructions can be used. */ > if (mode == XOmode) > - return (TARGET_MMA && FP_REGNO_P (regno) && (regno & 3) == 0); > + { > + if (!TARGET_DENSE_MATH) Here, we are assuming that !TARGET_DENSE_MATH means -mcpu is power10. However, for -mcpu=future, we can turn off the dense math facility. Any use of MMA instructions in this case will throw an error. > + return (FP_REGNO_P (regno) && (regno & 3) == 0); > + > + else if (DM_REGNO_P (regno)) > + return 1; > + > + else > + return (VSX_REGNO_P (regno) > + && VSX_REGNO_P (last_regno) > + && (regno & 1) == 0); > + } > + > + /* No other types other than XOmode can go in dense math registers. */ > + if (DM_REGNO_P (regno)) > + return 0; > > /* PTImode can only go in GPRs. Quad word memory operations require > even/odd > register combinations, and use PTImode where we need to deal with quad > @@ -2308,6 +2342,7 @@ rs6000_debug_reg_global (void) > rs6000_debug_reg_print (FIRST_ALTIVEC_REGNO, > LAST_ALTIVEC_REGNO, > "vs"); > + rs6000_debug_reg_print (FIRST_DM_REGNO, LAST_DM_REGNO, "dense_math"); > rs6000_debug_reg_print (LR_REGNO, LR_REGNO, "lr"); > rs6000_debug_reg_print (CTR_REGNO, CTR_REGNO, "ctr"); > rs6000_debug_reg_print (CR0_REGNO, CR7_REGNO, "cr"); > @@ -2634,6 +2669,21 @@ rs6000_setup_reg_addr_masks (void) > addr_mask = 0; > reg = reload_reg_map[rc].reg; > > + /* Special case dense math registers. */ > + if (rc == RELOAD_REG_DMR) > + { > + if (TARGET_DENSE_MATH && m2 == XOmode) > + { > + addr_mask = RELOAD_REG_VALID; > + reg_addr[m].addr_mask[rc] = addr_mask; > + any_addr_mask |= addr_mask; > + } > + else > + reg_addr[m].addr_mask[rc] = 0; > + > + continue; > + } > + > /* Can mode values go in the GPR/FPR/Altivec registers? */ > if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg]) > { > @@ -2784,6 +2834,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) > for (r = CR1_REGNO; r <= CR7_REGNO; ++r) > rs6000_regno_regclass[r] = CR_REGS; > > + for (r = FIRST_DM_REGNO; r <= LAST_DM_REGNO; ++r) > + rs6000_regno_regclass[r] = DM_REGS; > + > rs6000_regno_regclass[LR_REGNO] = LINK_REGS; > rs6000_regno_regclass[CTR_REGNO] = CTR_REGS; > rs6000_regno_regclass[CA_REGNO] = NO_REGS; > @@ -2808,6 +2861,7 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) > reg_class_to_reg_type[(int)LINK_OR_CTR_REGS] = SPR_REG_TYPE; > reg_class_to_reg_type[(int)CR_REGS] = CR_REG_TYPE; > reg_class_to_reg_type[(int)CR0_REGS] = CR_REG_TYPE; > + reg_class_to_reg_type[(int)DM_REGS] = DM_REG_TYPE; > > if (TARGET_VSX) > { > @@ -2994,8 +3048,11 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) > if (TARGET_DIRECT_MOVE_128) > rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; > > + /* Support for the accumulator registers, either FPR registers (aka > original > + mma) or dense math registers. */ > if (TARGET_MMA) > - rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS; > + rs6000_constraints[RS6000_CONSTRAINT_wD] > + = TARGET_DENSE_MATH ? DM_REGS : FLOAT_REGS; > > /* Set up the reload helper and direct move functions. */ > if (TARGET_VSX || TARGET_ALTIVEC) > @@ -12365,6 +12422,11 @@ rs6000_secondary_reload_memory (rtx addr, > addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX] > & ~RELOAD_REG_AND_M16); > > + /* Dense math registers use VSX registers for memory operations, and need > to > + generate some extra instructions. */ > + else if (rclass == DM_REGS) > + return 2; > + > /* If the register allocator hasn't made up its mind yet on the register > class to use, settle on defaults to use. */ > else if (rclass == NO_REGS) > @@ -12693,6 +12755,13 @@ rs6000_secondary_reload_simple_move (enum > rs6000_reg_type to_type, > || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE))) > return true; > > + /* We can transfer between VSX registers and dense math registers without > + needing extra registers. */ > + if (TARGET_DENSE_MATH && mode == XOmode > + && ((to_type == DM_REG_TYPE && from_type == VSX_REG_TYPE) > + || (to_type == VSX_REG_TYPE && from_type == DM_REG_TYPE))) > + return true; > + > return false; > } > > @@ -13387,6 +13456,10 @@ rs6000_preferred_reload_class (rtx x, enum reg_class > rclass) > machine_mode mode = GET_MODE (x); > bool is_constant = CONSTANT_P (x); > > + /* Dense math registers can't be loaded or stored. */ > + if (rclass == DM_REGS) > + return NO_REGS; > + > /* If a mode can't go in FPR/ALTIVEC/VSX registers, don't return a > preferred > reload class for it. */ > if ((rclass == ALTIVEC_REGS || rclass == VSX_REGS) > @@ -13483,7 +13556,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class > rclass) > return VSX_REGS; > > if (mode == XOmode) > - return FLOAT_REGS; > + return TARGET_DENSE_MATH ? VSX_REGS : FLOAT_REGS; > > if (GET_MODE_CLASS (mode) == MODE_INT) > return GENERAL_REGS; > @@ -13608,6 +13681,11 @@ rs6000_secondary_reload_class (enum reg_class > rclass, machine_mode mode, > else > regno = -1; > > + /* Dense math registers don't have loads or stores. We have to go through > + the VSX registers to load XOmode (vector quad). */ > + if (TARGET_DENSE_MATH && rclass == DM_REGS) > + return VSX_REGS; > + > /* If we have VSX register moves, prefer moving scalar values between > Altivec registers and GPR by going via an FPR (and then via memory) > instead of reloading the secondary memory address for Altivec moves. */ > @@ -14139,8 +14217,14 @@ print_operand (FILE *file, rtx x, int code) > output_operand. */ > > case 'A': > - /* Write the MMA accumulator number associated with VSX register X. */ > - if (!REG_P (x) || !FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) > + /* Write the MMA accumulator number associated with VSX register X. On > + dense math systems, only allow dense math accumulators, not > + accumulators overlapping with the FPR registers. */ > + if (!REG_P (x)) > + output_operand_lossage ("invalid %%A value"); > + else if (TARGET_DENSE_MATH && DM_REGNO_P (REGNO (x))) > + fprintf (file, "%d", REGNO (x) - FIRST_DM_REGNO); > + else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) > output_operand_lossage ("invalid %%A value"); > else > fprintf (file, "%d", (REGNO (x) - FIRST_FPR_REGNO) / 4); > @@ -22760,6 +22844,31 @@ rs6000_debug_address_cost (rtx x, machine_mode mode, > } > > > +/* Subroutine to determine the move cost of dense math registers. If we are > + moving to/from VSX_REGISTER registers, the cost is either 1 move (for > + 512-bit accumulators) or 2 moves (for 1,024 dense math registers). If we > are > + moving to anything else like GPR registers, make the cost very high. */ > + > +static int > +rs6000_dense_math_register_move_cost (machine_mode mode, reg_class_t rclass) > +{ > + const int reg_move_base = 2; > + HARD_REG_SET vsx_set = (reg_class_contents[rclass] > + & reg_class_contents[VSX_REGS]); > + > + if (TARGET_DENSE_MATH && !hard_reg_set_empty_p (vsx_set)) > + { > + /* __vector_quad (i.e. XOmode) is tranfered in 1 instruction. */ > + if (mode == XOmode) > + return reg_move_base; > + > + else > + return reg_move_base * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode); > + } > + > + return 1000 * 2 * hard_regno_nregs (FIRST_DM_REGNO, mode); > +} > + > /* A C expression returning the cost of moving data from a register of class > CLASS1 to one of CLASS2. */ > > @@ -22773,17 +22882,28 @@ rs6000_register_move_cost (machine_mode mode, > if (TARGET_DEBUG_COST) > dbg_cost_ctrl++; > > + HARD_REG_SET to_vsx, from_vsx; > + to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; > + from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; > + > + /* Special case dense math registers, that can only move to/from VSX > registers. */ > + if (from == DM_REGS && to == DM_REGS) > + ret = 2 * hard_regno_nregs (FIRST_DM_REGNO, mode); > + > + else if (from == DM_REGS) > + ret = rs6000_dense_math_register_move_cost (mode, to); > + > + else if (to == DM_REGS) > + ret = rs6000_dense_math_register_move_cost (mode, from); > + > /* If we have VSX, we can easily move between FPR or Altivec registers, > otherwise we can only easily move within classes. > Do this first so we give best-case answers for union classes > containing both gprs and vsx regs. */ > - HARD_REG_SET to_vsx, from_vsx; > - to_vsx = reg_class_contents[to] & reg_class_contents[VSX_REGS]; > - from_vsx = reg_class_contents[from] & reg_class_contents[VSX_REGS]; > - if (!hard_reg_set_empty_p (to_vsx) > - && !hard_reg_set_empty_p (from_vsx) > - && (TARGET_VSX > - || hard_reg_set_intersect_p (to_vsx, from_vsx))) > + else if (!hard_reg_set_empty_p (to_vsx) > + && !hard_reg_set_empty_p (from_vsx) > + && (TARGET_VSX > + || hard_reg_set_intersect_p (to_vsx, from_vsx))) > { > int reg = FIRST_FPR_REGNO; > if (TARGET_VSX > @@ -22879,6 +22999,9 @@ rs6000_memory_move_cost (machine_mode mode, > reg_class_t rclass, > ret = 4 * hard_regno_nregs (32, mode); > else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS)) > ret = 4 * hard_regno_nregs (FIRST_ALTIVEC_REGNO, mode); > + else if (reg_classes_intersect_p (rclass, DM_REGS)) > + ret = (rs6000_dense_math_register_move_cost (mode, VSX_REGS) > + + rs6000_memory_move_cost (mode, VSX_REGS, false)); > else > ret = 4 + rs6000_register_move_cost (mode, rclass, GENERAL_REGS); > > @@ -24087,6 +24210,8 @@ rs6000_compute_pressure_classes (enum reg_class > *pressure_classes) > if (TARGET_HARD_FLOAT) > pressure_classes[n++] = FLOAT_REGS; > } > + if (TARGET_DENSE_MATH) > + pressure_classes[n++] = DM_REGS; > pressure_classes[n++] = CR_REGS; > pressure_classes[n++] = SPECIAL_REGS; > > @@ -24251,6 +24376,10 @@ rs6000_debugger_regno (unsigned int regno, unsigned > int format) > return 67; > if (regno == 64) > return 64; > + /* XXX: This is a guess. The GCC register number for FIRST_DM_REGNO is > 111, > + but the frame pointer regnum uses that. */ > + if (DM_REGNO_P (regno)) > + return regno - FIRST_DM_REGNO + 112; > > gcc_unreachable (); > } > @@ -27490,9 +27619,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > unsigned offset = 0; > unsigned size = GET_MODE_SIZE (reg_mode); > > - /* If we are reading an accumulator register, we have to > - deprime it before we can access it. */ > - if (TARGET_MMA > + /* If we are reading an accumulator register, we have to deprime it > + before we can access it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -27524,9 +27653,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > emit_insn (gen_rtx_SET (dst2, src2)); > } > > - /* If we are writing an accumulator register, we have to > - prime it after we've written it. */ > - if (TARGET_MMA > + /* If we are writing an accumulator register, we have to prime it > + after we've written it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > > @@ -27540,7 +27669,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > || XINT (src, 1) == UNSPECV_MMA_ASSEMBLE); > gcc_assert (REG_P (dst)); > if (GET_MODE (src) == XOmode) > - gcc_assert (FP_REGNO_P (REGNO (dst))); > + gcc_assert ((TARGET_DENSE_MATH > + ? VSX_REGNO_P (REGNO (dst)) > + : FP_REGNO_P (REGNO (dst)))); > if (GET_MODE (src) == OOmode) > gcc_assert (VSX_REGNO_P (REGNO (dst))); > > @@ -27593,9 +27724,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > emit_insn (gen_rtx_SET (dst_i, op)); > } > > - /* We are writing an accumulator register, so we have to > - prime it after we've written it. */ > - if (GET_MODE (src) == XOmode) > + /* We are writing an accumulator register, so we have to prime it > + after we've written it unless we have dense math registers. */ > + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH) > emit_insn (gen_mma_xxmtacc (dst, dst)); > > return; > @@ -27606,9 +27737,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) > { > - /* If we are reading an accumulator register, we have to > - deprime it before we can access it. */ > - if (TARGET_MMA > + /* If we are reading an accumulator register, we have to deprime it > + before we can access it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -27634,9 +27765,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > i * reg_mode_size))); > } > > - /* If we are writing an accumulator register, we have to > - prime it after we've written it. */ > - if (TARGET_MMA > + /* If we are writing an accumulator register, we have to prime it after > + we've written it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > } > @@ -27771,9 +27902,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > gcc_assert (rs6000_offsettable_memref_p (dst, reg_mode, true)); > } > > - /* If we are reading an accumulator register, we have to > - deprime it before we can access it. */ > - if (TARGET_MMA && REG_P (src) > + /* If we are reading an accumulator register, we have to deprime it > + before we can access it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src) > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -27803,9 +27934,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > j * reg_mode_size))); > } > > - /* If we are writing an accumulator register, we have to > - prime it after we've written it. */ > - if (TARGET_MMA && REG_P (dst) > + /* If we are writing an accumulator register, we have to prime it after > + we've written it unless we have dense math registers. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst) > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > > diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h > index 04709f0dcd6..5214a7c22ce 100644 > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -653,6 +653,7 @@ extern unsigned char rs6000_recip_bits[]; > #define UNITS_PER_FP_WORD 8 > #define UNITS_PER_ALTIVEC_WORD 16 > #define UNITS_PER_VSX_WORD 16 > +#define UNITS_PER_DM_WORD 128 > > /* Type used for ptrdiff_t, as a string used in a declaration. */ > #define PTRDIFF_TYPE "int" > @@ -766,7 +767,7 @@ enum data_align { align_abi, align_opt, align_both }; > Another pseudo (not included in DWARF_FRAME_REGISTERS) is soft frame > pointer, which is eventually eliminated in favor of SP or FP. */ > > -#define FIRST_PSEUDO_REGISTER 111 > +#define FIRST_PSEUDO_REGISTER 119 > > /* Use standard DWARF numbering for DWARF debugging information. */ > #define DEBUGGER_REGNO(REGNO) rs6000_debugger_regno ((REGNO), 0) > @@ -803,7 +804,9 @@ enum data_align { align_abi, align_opt, align_both }; > /* cr0..cr7 */ \ > 0, 0, 0, 0, 0, 0, 0, 0, \ > /* vrsave vscr sfp */ \ > - 1, 1, 1 \ > + 1, 1, 1, \ > + /* Dense math registers. */ \ > + 0, 0, 0, 0, 0, 0, 0, 0 \ > } > > /* Like `CALL_USED_REGISTERS' except this macro doesn't require that > @@ -827,7 +830,9 @@ enum data_align { align_abi, align_opt, align_both }; > /* cr0..cr7 */ \ > 1, 1, 0, 0, 0, 1, 1, 1, \ > /* vrsave vscr sfp */ \ > - 0, 0, 0 \ > + 0, 0, 0, \ > + /* Dense math registers. */ \ > + 0, 0, 0, 0, 0, 0, 0, 0 \ > } > > #define TOTAL_ALTIVEC_REGS (LAST_ALTIVEC_REGNO - FIRST_ALTIVEC_REGNO + 1) > @@ -864,6 +869,7 @@ enum data_align { align_abi, align_opt, align_both }; > v2 (not saved; incoming vector arg reg; return value) > v19 - v14 (not saved or used for anything) > v31 - v20 (saved; order given to save least number) > + dmr0 - dmr7 (not saved) Shouldn't this be 'saved' here? Also, all the other entries are from a higher register number to a lower register number. So this should be dmr7 - dmr0. > vrsave, vscr (fixed) > sfp (fixed) > */ > @@ -906,6 +912,9 @@ enum data_align { align_abi, align_opt, align_both }; > 66, \ > 83, 82, 81, 80, 79, 78, \ > 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, \ > + /* Dense math registers. */ \ > + 111, 112, 113, 114, 115, 116, 117, 118, \ Here too, I believe the registers should be in decreasing order. And should the entry for dense math registers occur below the entry for register 110? -Surya > + /* Vrsave, vscr, sfp. */ \ > 108, 109, \ > 110 \ > } > @@ -932,6 +941,9 @@ enum data_align { align_abi, align_opt, align_both }; > /* True if register is a VSX register. */ > #define VSX_REGNO_P(N) (FP_REGNO_P (N) || ALTIVEC_REGNO_P (N)) > > +/* True if register is a Dense math register. */ > +#define DM_REGNO_P(N) ((N) >= FIRST_DM_REGNO && (N) <= LAST_DM_REGNO) > + > /* Alternate name for any vector register supporting floating point, no > matter > which instruction set(s) are available. */ > #define VFLOAT_REGNO_P(N) \ > @@ -1069,6 +1081,7 @@ enum reg_class > FLOAT_REGS, > ALTIVEC_REGS, > VSX_REGS, > + DM_REGS, > VRSAVE_REGS, > VSCR_REGS, > GEN_OR_FLOAT_REGS, > @@ -1098,6 +1111,7 @@ enum reg_class > "FLOAT_REGS", > \ > "ALTIVEC_REGS", \ > "VSX_REGS", > \ > + "DM_REGS", \ > "VRSAVE_REGS", \ > "VSCR_REGS", > \ > "GEN_OR_FLOAT_REGS", > \ > @@ -1132,6 +1146,8 @@ enum reg_class > { 0x00000000, 0x00000000, 0xffffffff, 0x00000000 }, > \ > /* VSX_REGS. */ \ > { 0x00000000, 0xffffffff, 0xffffffff, 0x00000000 }, > \ > + /* DM_REGS. */ \ > + { 0x00000000, 0x00000000, 0x00000000, 0x007f8000 }, > \ > /* VRSAVE_REGS. */ > \ > { 0x00000000, 0x00000000, 0x00000000, 0x00001000 }, > \ > /* VSCR_REGS. */ \ > @@ -1159,7 +1175,7 @@ enum reg_class > /* CA_REGS. */ \ > { 0x00000000, 0x00000000, 0x00000000, 0x00000004 }, > \ > /* ALL_REGS. */ \ > - { 0xffffffff, 0xffffffff, 0xffffffff, 0x00007fff } \ > + { 0xffffffff, 0xffffffff, 0xffffffff, 0x007fffff } \ > } > > /* The same information, inverted: > @@ -2060,7 +2076,16 @@ extern char rs6000_reg_names[][8]; /* register > names (0 vs. %r0). */ > &rs6000_reg_names[108][0], /* vrsave */ \ > &rs6000_reg_names[109][0], /* vscr */ \ > \ > - &rs6000_reg_names[110][0] /* sfp */ \ > + &rs6000_reg_names[110][0], /* sfp */ \ > + \ > + &rs6000_reg_names[111][0], /* dmr0 */ \ > + &rs6000_reg_names[112][0], /* dmr1 */ \ > + &rs6000_reg_names[113][0], /* dmr2 */ \ > + &rs6000_reg_names[114][0], /* dmr3 */ \ > + &rs6000_reg_names[115][0], /* dmr4 */ \ > + &rs6000_reg_names[116][0], /* dmr5 */ \ > + &rs6000_reg_names[117][0], /* dmr6 */ \ > + &rs6000_reg_names[118][0], /* dmr7 */ \ > } > > /* Table of additional register names to use in user input. */ > @@ -2114,6 +2139,8 @@ extern char rs6000_reg_names[][8]; /* register > names (0 vs. %r0). */ > {"vs52", 84}, {"vs53", 85}, {"vs54", 86}, {"vs55", 87}, \ > {"vs56", 88}, {"vs57", 89}, {"vs58", 90}, {"vs59", 91}, \ > {"vs60", 92}, {"vs61", 93}, {"vs62", 94}, {"vs63", 95}, \ > + {"dmr0", 111}, {"dmr1", 112}, {"dmr2", 113}, {"dmr3", 114}, \ > + {"dmr4", 115}, {"dmr5", 116}, {"dmr6", 117}, {"dmr7", 118}, \ > } > > /* This is how to output an element of a case-vector that is relative. */ > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 3089551552c..57a239791ee 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -51,6 +51,8 @@ (define_constants > (VRSAVE_REGNO 108) > (VSCR_REGNO 109) > (FRAME_POINTER_REGNUM 110) > + (FIRST_DM_REGNO 111) > + (LAST_DM_REGNO 118) > ]) > > ;;
