Hi Peter, thanks for looking into this patch.
On Tue, 2025-11-25 at 09:06 -0600, Peter Bergner wrote:
> On 11/24/25 10:49 PM, Avinash Jayakar wrote:
> > As discussed, we need to relax the requirement on __vector_pair and
> > __vector_quad so that it is not tied up to MMA.
>
> I can believe enabling __vector_pair on pre-Power10 cpus would work,
> but I don't think you can do the same for __vector_quad, given its
> use as a proxy for the Power10 MMA accumulators.
If the MMA is not available, can we just map the __vector_quad to 4
contiguous vsx registers (just like __vector_pair), in which case we
would not need to prime/deprime.
> The rs6000 port
> has some nasty code that was hard to get right that automatically
> emits the MMA insns xxmtacc & xxmfacc to prime & deprime the
> accumulators on XOmode loads and stores. That would all have to
> be disabled on pre-Power10 cpus.
>
> Can you remind me what problem you are trying to solve?
>
So the issue is that the types __vector_pair and __vector_quad are
coupled with MMA, which introduces issues in adding potentially new
builtins that require a pair of vector registers, but perhaps they may
not have the MMA unit but may have the VSX unit (e.g., -mcpu=future -
mno-mma). Potential new insns may make use of the __vector_pair with
just the VSX unit enabled.
But I wanted to clarify one thing. In PR106736, you mention in Comment
#3, that __vector_quad and __vector_pair should only be used for MMA.
But can we not use it for other potential units?
Also following the chain of PRs linked to 106736, I see many ICE which
come from not implementing movoo/movxo for !TARGET_MMA. But I think the
best solution would be to implement movoo/movxo for all targets, since
they are opaque modes and do not require values to stay in particular
register. For example, for movoo if there is no mma/vsx we could fall
back to implementing it with 4 64 bit contiguous GPRs.
Sth like this,
(define_insn_and_split "*movoo"
[(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa,r,Z,r")
(match_operand:OO 1 "input_operand" "ZwO,wa,wa,Z,r,r"))]
"(TARGET_P9_VECTOR || TARGET_POWER8)
&& (gpc_reg_operand (operands[0], OOmode)
|| gpc_reg_operand (operands[1], OOmode))"
"@
#
#
#
#
#
#"
"&& reload_completed"
[(const_int 0)]
{
rs6000_split_multireg_move (operands[0], operands[1]);
DONE;
}
the constraints "r,Z,r" allow movoo to fallback on GPRs and
rs6000_split_multireg_move would actually use DImode to split the
moves.
So we could get completely rid of all the ICEs and error message saying
"__vector_pair requires -mmma option".
And since these types are used mostly by the builtins, they will error
out if a feature is not enabled, but not ICE as seen in PR103343.
Please let me know your thoughts on this.
Thanks and regards,
Avinash Jayakar
> Peter
>
>
>
> bergner@cfarm120:~$ cat vec_quad.c
> void
> food (__vector_quad *dst, __vector_quad *src)
> {
> *dst = *src;
> }
> bergner@cfarm120:~$ gcc -O2 -mcpu=power10 -S vec_quad.c
> bergner@cfarm120:~$ cat vec_quad.s
> .file "vec_quad.c"
> .machine power10
> .abiversion 2
> .section ".text"
> .align 2
> .p2align 4,,15
> .globl food
> .type food, @function
> food:
> .LFB0:
> .cfi_startproc
> .localentry food,1
> lxvp 2,0(4)
> lxvp 0,32(4)
> xxmtacc 0
> xxmfacc 0
> stxvp 2,0(3)
> stxvp 0,32(3)
> blr
> .long 0
> .byte 0,0,0,0,0,0,0,0
> .cfi_endproc
> .LFE0:
> .size food,.-food
> .ident "GCC: (GNU) 11.5.0 20240719 (Red Hat 11.5.0-11)"
> .section .note.GNU-stack,"",@progbits