On Tue, 2025-11-25 at 15:07 -0600, Peter Bergner wrote:
> On 11/25/25 10:31 AM, Avinash Jayakar wrote:
> > If the MMA is not available, can we just map the __vector_quad to 4
> > contiguous vsx registers (just like __vector_pair), in which case
> > we
> > would not need to prime/deprime.
> 
> Note that __vector_pair/__vector_quad are not just 2/4 contiguous VSX
> registers, but they are aligned too.  Meaning __vector_pair reg pairs
> are even,odd registers (ie, 0-1,2-3,4-5,etc.), while the
> __vector_quad
> regs are even more aligned (0-3,4-7,8-11,etc.).
> 
> Also note that although __vector_pair can use all 64 VSX registers,
> __vector_quad regs are limited to the lower 32 VSX registers, since
> they're acting as proxies for the 8 MMA accumulators, meaning you
> can't
> have a __vector_quad variable being allocated to the upper 32
> VSX/Altivec
> registers.  If what you want to use __vector_quad for outside of MMA
> is
> to operate on them using Altivec (not VSX) insns, then you're out of
> luck currently.
> 
Ok now I understand the intricacy in using __vector_quad.

Perhaps I should rephrase this patch and instead say
"Support movoo and movxo when !TARGET_MMA".

Since oo and xo are opaque modes, we could implement its mov operations
required for the reload pass, in any set of registers, which should
remove the need of rs6000_opaque_type_invalid_use_p function which has
some fragile error messages (which are not seen using clang btw).

This way if we have code that uses __vector_pair/__vector_quad without
any builtins that require it, it will not ICE but instead generate any
possible means of loading 256bit memory from register-memory, memory-
register and register-register. But ofc when a builtin is used that
needs __vector_pair/__vector_quad, then we would see the error of not
using -mmma, at which point OOmode is pair of aligned pair of vsx regs
and XOmode is lower 32 aligned 4 contig registers.  


> 
> 
> 
> > So the issue is that the types __vector_pair and __vector_quad are
> > coupled with MMA, which introduces issues in adding potentially new
> > builtins that require a pair of vector registers, but perhaps they
> > may
> > not have the MMA unit but may have the VSX unit (e.g., -mcpu=future
> > -
> > mno-mma). Potential new insns may make use of the __vector_pair
> > with
> > just the VSX unit enabled.
> > 
> > But I wanted to clarify one thing. In PR106736, you mention in
> > Comment
> > #3, that __vector_quad and __vector_pair should only be used for
> > MMA.
> > But can we not use it for other potential units?
> 
> As I said in my previous reply, I think you could get __vector_pair
> working with !TARGET_MMA, since they are just even,odd VSX register
> pairs.  My comment from the bugzilla was due to there was no outside
> of MMA usage of __vector_pair at that time.
> 
> I think __vector_quad is much more problematical, since they not only
> represent the 8 pairs of 4 VSX register quads, but they also serve as
> proxies for the 8 MMA accumulators.  Splitting that apart would be
> very
> very painful and fraught with risks of introducing bugs.
> 
> You state that "perhaps they may not have the MMA unit" when using
> those __vector_quads.  Does that mean they may have the MMA unit too?
> If so, I cannot see how you could get __vector_quad to work for your
> new intended usage when MMA is enabled, since it's so tied to MMA.
> 
> I would think the easiest thing to do, is to make a new type (mode
> too?)
> that mimics the parts of what __vector_quad does that you want, but
> without the MMA baggage.
> 
Right now there is no need of the __vector_quad, but to fix the ICE
issues we will have to support movxo when !TARGET_MMA.

I will send a new patch with this change soon.

Thanks and regards,
Avinash Jayakar

Reply via email to