Hi!

On Wed, Nov 09, 2022 at 09:43:16PM -0500, Michael Meissner wrote:
> This patch is very preliminary support for a potential new feature to the
> PowerPC that extends the current power10 MMA architecture.  This feature may 
> or
> may not be present in any specific future PowerPC processor.

MMA is an optional facility in ISA 3.1 -- please don't say it is power10
only.

> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.

Four VSRs.  FPRs are only 64bits.  You mean this is VSRs 0..31 .

> When
> you issue a prime instruction, it makes sure the accumulator is a copy of the 
> 4

I suppose you mean the xxmtacc instruction?

> FPR registers the accumulator is tied to.  When you issue a deprime
> instruction, it makes sure that the accumulator data content is logically
> copied to the matching FPR register.

And xxmfacc.

Very importantly all the other rules in 7.2.1.3 "VSX Accumulators"
apply as well.  That should make old code work on new systems
transparently.

> In terms of changes, we now use the wD constraint for accumulators.  If you
> compile with -mcpu=power10, the wD constraint will match the equivalent FPR
> register that overlaps with the accumulator.

The set of *four* *VSX* registers.  Of course in the end it is just a
number, but :-)

> If you compile with -mcpu=future,
> the wD constraint will match the DMR register and not the FPR register.

Constraints do not "match" anything.  "Will allow" perhaps?

> In general, if you only use the built-in functions, things work between the 
> two
> systems.  If you use extended asm, you will likely need to modify the code.
> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.

You *already* are required to follow all these rules that make this
painless and transparent.

> There is one bug that I noticed.  When you use the full DMR instruction the
> constant copy propagation patch issues internal errors.  I believe this is due
> to the CCP pass not handling opaque types cleanly enough, and it only shows up
> in larger types.  I would like to get these patches committed, and then work
> the maintainers of the CCP to fix the problem.

Erm.  If the compiler ICEs, we can not include this code.  But hopefully
you mean something else?


Segher

Reply via email to