On Tue, 2025-11-25 at 15:07 -0600, Peter Bergner wrote: > On 11/25/25 10:31 AM, Avinash Jayakar wrote: > > If the MMA is not available, can we just map the __vector_quad to 4 > > contiguous vsx registers (just like __vector_pair), in which case > > we > > would not need to prime/deprime. > > Note that __vector_pair/__vector_quad are not just 2/4 contiguous VSX > registers, but they are aligned too. Meaning __vector_pair reg pairs > are even,odd registers (ie, 0-1,2-3,4-5,etc.), while the > __vector_quad > regs are even more aligned (0-3,4-7,8-11,etc.). > > Also note that although __vector_pair can use all 64 VSX registers, > __vector_quad regs are limited to the lower 32 VSX registers, since > they're acting as proxies for the 8 MMA accumulators, meaning you > can't > have a __vector_quad variable being allocated to the upper 32 > VSX/Altivec > registers. If what you want to use __vector_quad for outside of MMA > is > to operate on them using Altivec (not VSX) insns, then you're out of > luck currently. > Ok now I understand the intricacy in using __vector_quad.
Perhaps I should rephrase this patch and instead say "Support movoo and movxo when !TARGET_MMA". Since oo and xo are opaque modes, we could implement its mov operations required for the reload pass, in any set of registers, which should remove the need of rs6000_opaque_type_invalid_use_p function which has some fragile error messages (which are not seen using clang btw). This way if we have code that uses __vector_pair/__vector_quad without any builtins that require it, it will not ICE but instead generate any possible means of loading 256bit memory from register-memory, memory- register and register-register. But ofc when a builtin is used that needs __vector_pair/__vector_quad, then we would see the error of not using -mmma, at which point OOmode is pair of aligned pair of vsx regs and XOmode is lower 32 aligned 4 contig registers. > > > > > So the issue is that the types __vector_pair and __vector_quad are > > coupled with MMA, which introduces issues in adding potentially new > > builtins that require a pair of vector registers, but perhaps they > > may > > not have the MMA unit but may have the VSX unit (e.g., -mcpu=future > > - > > mno-mma). Potential new insns may make use of the __vector_pair > > with > > just the VSX unit enabled. > > > > But I wanted to clarify one thing. In PR106736, you mention in > > Comment > > #3, that __vector_quad and __vector_pair should only be used for > > MMA. > > But can we not use it for other potential units? > > As I said in my previous reply, I think you could get __vector_pair > working with !TARGET_MMA, since they are just even,odd VSX register > pairs. My comment from the bugzilla was due to there was no outside > of MMA usage of __vector_pair at that time. > > I think __vector_quad is much more problematical, since they not only > represent the 8 pairs of 4 VSX register quads, but they also serve as > proxies for the 8 MMA accumulators. Splitting that apart would be > very > very painful and fraught with risks of introducing bugs. > > You state that "perhaps they may not have the MMA unit" when using > those __vector_quads. Does that mean they may have the MMA unit too? > If so, I cannot see how you could get __vector_quad to work for your > new intended usage when MMA is enabled, since it's so tied to MMA. > > I would think the easiest thing to do, is to make a new type (mode > too?) > that mimics the parts of what __vector_quad does that you want, but > without the MMA baggage. > Right now there is no need of the __vector_quad, but to fix the ICE issues we will have to support movxo when !TARGET_MMA. I will send a new patch with this change soon. Thanks and regards, Avinash Jayakar
