[PATCH 1/7] Add wD constraint to the PowerPC

Michael Meissner Wed, 01 Jul 2026 11:47:48 -0700

Add wD constraint.

This is part one of the dense math register patches for the PowerPC.
This is the 7th version of the dense math patches.


Version 6 of the dense math register patches were posted on April 21st,
2026.

 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713352.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713353.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713354.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713355.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713356.html
 * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713357.html

This patch needs the -mcpu=future patch posted on April 8th, 2026:

  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712532.html

This particular patch did not change from version 6.

This patch adds a new constraint ('wD') that matches the accumulator registers
used by the MMA instructions.  Possible future PowerPC machines are thinking
about having a new set of 8 dense math accumulators that will be 1,024 bits in
size.  The 'wD' constaint was chosen because the VSX constraints start with 'w'.
The 'wd' constraint was already used, so I chose 'wD' to be similar.

To change code to possibly use dense math registers, the 'd' constraint should
be changed to 'wD', and the predicate 'fpr_reg_operand' should be changed to
'accumulator_operand'.

On current power10/power11 systems, the accumulators overlap with the 32
traditional FPR registers (i.e. VSX vector registers 0..31).  Each accumulator
uses 4 adjacent FPR/VSX registers for a 512 bit logical register.

Possible future PowerPC machines would have these 8 accumulator registers be
separate registers, called dense math registers.  It is anticipated that when in
dense math register mode, the MMA instructions would use the accumulators
instead of the adjacent VSX registers.  I.e. in power10/power11 mode,
accumulator 1 will overlap with vector registers 4-7, but in dense math register
mode, accumulator 1 will be a separate register.

Code compiled for power10/power11 systems will continue to work on the potential
future machine with dense math register support but the compiler will have fewer
vector registers available for allocation because it believe the accumulators
are using vector registers.  For example, the file mma-double-test.c in the
gcc.target/powerpc testsuite directory has 8 more register spills to/from the
stack for power10/power11 code then when compiled with dense math register
support.

I have committed all of the patches in my backlog (dense math registers, other
-mcpu=future instructions, random bug fixes, support for _Float16 and
__bfloat16, and optimizations for vector logical operations on power10/power11)
into the IBM vendor branch:

        vendors/ibm/gcc-17-future

I have built bootstrap little endian compilers on power10 systems, and
big endian compiler on power9 systems.  There were no regression in the
tests.  Can I add the patches to the GCC trunk?

2026-07-01  Michael Meissner  <[email protected]>

gcc/

        * config/rs6000/constraints.md (wD): New constraint.
        * config/rs6000/predicates.md (accumulator_operand): New predicate.
        * config/rs6000/rs6000.cc (rs6000_debug_reg_global): Print the register
        class for the 'wD' constraint.
        (rs6000_init_hard_regno_mode_ok): Set up the 'wD' register constraint
        class.
        * config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add element for
        the 'wD' constraint.
        * doc/md.texi (PowerPC constraints): Document the 'wD' constraint.

---
 gcc/config/rs6000/constraints.md |  3 +++
 gcc/config/rs6000/predicates.md  | 18 ++++++++++++++++++
 gcc/config/rs6000/rs6000.cc      |  7 ++++++-
 gcc/config/rs6000/rs6000.h       |  1 +
 gcc/doc/md.texi                  |  5 +++++
 5 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index d0ed47faab8..0d1cde5bd4d 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -107,6 +107,9 @@ (define_constraint "wB"
        (match_test "TARGET_P8_VECTOR")
        (match_operand 0 "s5bit_cint_operand")))
 
+(define_register_constraint "wD" "rs6000_constraints[RS6000_CONSTRAINT_wD]"
+  "Accumulator register.")
+
 (define_constraint "wE"
   "@internal Vector constant that can be loaded with the XXSPLTIB instruction."
   (match_test "xxspltib_constant_nosplit (op, mode)"))
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 54dbc8bcc95..682fd2dc6e8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -186,6 +186,24 @@ (define_predicate "vlogical_operand"
   return VLOGICAL_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is an accumulator.  On power10 systems, the accumulators
+;; overlap with the FPRs.
+(define_predicate "accumulator_operand"
+  (match_operand 0 "register_operand")
+{
+  if (SUBREG_P (op))
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  if (!HARD_REGISTER_P (op))
+    return 1;
+
+  int r = REGNO (op);
+  return FP_REGNO_P (r) && (r & 3) == 0;
+})
+
 ;; Return 1 if op is the carry register.
 (define_predicate "ca_operand"
   (match_operand 0 "register_operand")
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index afe5d45c125..5c17b05b829 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -2328,6 +2328,7 @@ rs6000_debug_reg_global (void)
           "wr reg_class = %s\n"
           "wx reg_class = %s\n"
           "wA reg_class = %s\n"
+          "wD reg_class = %s\n"
           "\n",
           reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
           reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
@@ -2335,7 +2336,8 @@ rs6000_debug_reg_global (void)
           reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
           reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
           reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]],
-          reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]]);
+          reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wA]],
+          reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wD]]);
 
   nl = "\n";
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -2992,6 +2994,9 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
   if (TARGET_DIRECT_MOVE_128)
     rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
 
+  if (TARGET_MMA)
+    rs6000_constraints[RS6000_CONSTRAINT_wD] = FLOAT_REGS;
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 401f50ead4f..29d12517a10 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1187,6 +1187,7 @@ enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wr,                /* GPR register if 64-bit  */
   RS6000_CONSTRAINT_wx,                /* FPR register for STFIWX */
   RS6000_CONSTRAINT_wA,                /* BASE_REGS if 64-bit.  */
+  RS6000_CONSTRAINT_wD,                /* Accumulator regs if MMA/Dense Math.  
*/
   RS6000_CONSTRAINT_MAX
 };
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1ef748796f5..f227353bd82 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3294,6 +3294,11 @@ Like @code{d}, if @option{-mpowerpc-gfxopt} is used; 
otherwise, @code{NO_REGS}.
 @item wA
 Like @code{b}, if @option{-mpowerpc64} is used; otherwise, @code{NO_REGS}.
 
+@item wD
+Accumulator register if @option{-mma} is used; otherwise,
+@code{NO_REGS}.  For @option{-mcpu=power10} the accumulator registers
+overlap with VSX vector registers 0..31.
+
 @item wB
 Signed 5-bit constant integer that can be loaded into an Altivec register.
 
-- 
2.54.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]

[PATCH 1/7] Add wD constraint to the PowerPC

Reply via email to