This patch is a modification of the V6 patches that I sent out on April
21st, 2026.

In particular, I made the changes in relation to the comments posted in
February that I didn't fully address previously.

Here is comment from February:

  * https://gcc.gnu.org/pipermail/gcc-patches/2026-February/708071.html

Here is my reply:

  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/715248.html

Here are the V6 patches posted on April 21st, 2026:

  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713352.html
  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713353.html
  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713354.html
  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713356.html
  * https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713357.html

There are 7 patches in this patch set:

Patch #1 adds the wD constraint and the accumulator_operand predicate.

Patch #2 switches mma.md to use the wD constraint and accumulator_operand
predicate.

Patch #3 adds the -mdense-math option, but in this patch, -mdense-math is not
implemented.

Patch #4 adds support for 512-bit dense math registers.

Patch #5 adds support for 1,024-bit dense math registers.

Patch #6 is an optional patch that changes the name of the MMA instructions from
the original name used in the power10/power11 time line to a new alternate name
that has 'dm' (for dense math) in the instruction name.  Note, this is a new
patch for the V7 patch set.

Patch #7 clones the mma builtin tests to test the code generation of MMA
instructions if -mcpu=future is used.  Note, this is a new patch for the V7
patch set.  If patch #6 is not applied, this patch will need to be modified.

I have committed all of the patches in my backlog (dense math registers, other
-mcpu=future instructions, random bug fixes, support for _Float16 and
__bfloat16, and optimizations for vector logical operations on power10/power11)
into the IBM vendor branch:

        vendors/ibm/gcc-17-future

The following is the description of dense math registes from previous versions
of the patches.

The Dense Math Facility (dmf) is designed to be an extension to the ISA
3.1 (i.e. power10/power11) MMA facility.  Now, since these are future
patches, the Dense Math Facility might appear in future PowerPC
machines or maybe it won't be used in real hardware.

One of the concepts of the DMF system is the accumulators used in the
MMA and the DMF extensions will become separate registers, rather
than being overlaid over the traditional floating point registers
(i.e. VSX registers 0..31).

In addition to being separate registers, the dense math accumulators
are now logically 1,024 biits instead of 512.

The way the Dense Math registers and instructions are designed,
existing power10/power11 MMA instructions that operate on 512 bits will
work with Dense Math.  In ISA 3.1, each of the 8 accumulators are
overlaid over 4 adjacent FPR registers, and the compiler must not touch
the 4 adjacent FPRs while the MMA accumulator is used.

In the Dense Math system, the accumulator is a separate register.  When
-mcpu=power11 or -mcpu=power10 is used, the GCC compiler will not
allocate the appropriate FPR (VSX) reigsters when generating MMA
instructions.

If a function compiled for Power10/Power11 is run on a system with
Dense Math support enabled, the effect is a bunch of the FPR registers
will not be allocated because the compiler assumes the accumulaters are
there.  After these patches are applied, if the user compiles the code
with -mcpu=future, the compiler can allocate up to 32 more vector
registers, because the Dense Math accumulators are separate registers.

In fact two of the MMA tests (mma-double-test.c and mma-single-test.c)
do about 20 less spills of floating point values to the stack, since
the compiler can allocate those FPR vector registers for other
purposes.

These 7 patches will allow GCC to allocate these registers if the
-mcpu=future option is used.

  1: The first patch adds a new constraint (%wD) that can be used by
     code generating MMA instructions. If the user used -mcpu=power10
     or -mcpu=power11, %wD will act like %d and insist the register be
     VSX registers 0..31.  If the user used -mcpu=future, the new
     separate dense math accumulators will be used.

  2: This patch modifies the config/rs6000/mma.md file to use the wD
     constraint.

  3: This patch adds the -mdense-math option, but it does not add support for
     dense math registers until patch #4.

  4: This patch adds the support for the current MMA 512-bit
     instructions to use separate accumulators in the dense math registers
     instead of being overlaid on top of VSX registers 0..31.

  5: This patch adds support for an extension to MMA where the
     accumulators grow to 1,024 bits instead of 512 bits.

  6: This patch is an optional patch that adds comments to the various
     MMA insn that explain what MMA instructions are generated by the
     particular insn.

  7: This patch adds new tests for dense math support.

This patch is the foundation for the Dense Math support.  It is
expected other patches may be added to this to support potential new
features added to the Dense Math Facility.

I have built bootstrap little endian compilers on power10 systems, and
big endian compiler on power9 systems.  There were no regression in the
tests.  Can I add the patches to the GCC trunk?


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]

Reply via email to