Hi Peter, thanks for looking into this patch. 

On Tue, 2025-11-25 at 09:06 -0600, Peter Bergner wrote:
> On 11/24/25 10:49 PM, Avinash Jayakar wrote:
> > As discussed, we need to relax the requirement on __vector_pair and
> > __vector_quad so that it is not tied up to MMA.
> 
> I can believe enabling __vector_pair on pre-Power10 cpus would work,
> but I don't think you can do the same for __vector_quad, given its
> use as a proxy for the Power10 MMA accumulators. 

If the MMA is not available, can we just map the __vector_quad to 4
contiguous vsx registers (just like __vector_pair), in which case we
would not need to prime/deprime.

>  The rs6000 port
> has some nasty code that was hard to get right that automatically
> emits the MMA insns xxmtacc & xxmfacc to prime & deprime the
> accumulators on XOmode loads and stores.  That would all have to
> be disabled on pre-Power10 cpus.
> 
> Can you remind me what problem you are trying to solve?
> 

So the issue is that the types __vector_pair and __vector_quad are
coupled with MMA, which introduces issues in adding potentially new
builtins that require a pair of vector registers, but perhaps they may
not have the MMA unit but may have the VSX unit (e.g., -mcpu=future -
mno-mma). Potential new insns may make use of the __vector_pair with
just the VSX unit enabled.

But I wanted to clarify one thing. In PR106736, you mention in Comment
#3, that __vector_quad and __vector_pair should only be used for MMA.
But can we not use it for other potential units?

Also following the chain of PRs linked to 106736, I see many ICE which
come from not implementing movoo/movxo for !TARGET_MMA. But I think the
best solution would be to implement movoo/movxo for all targets, since
they are opaque modes and do not require values to stay in particular
register. For example, for movoo if there is no mma/vsx we could fall
back to implementing it with 4 64 bit contiguous GPRs.
Sth like this,
(define_insn_and_split "*movoo"
  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa,r,Z,r")
        (match_operand:OO 1 "input_operand" "ZwO,wa,wa,Z,r,r"))]
  "(TARGET_P9_VECTOR || TARGET_POWER8)
   && (gpc_reg_operand (operands[0], OOmode)
       || gpc_reg_operand (operands[1], OOmode))"
  "@
   #
   #
   #
   #
   #
   #"
  "&& reload_completed"
  [(const_int 0)]
{
  rs6000_split_multireg_move (operands[0], operands[1]);
  DONE;
}

the constraints "r,Z,r" allow movoo to fallback on GPRs and
rs6000_split_multireg_move would actually use DImode to split the
moves.


So we could get completely rid of all the ICEs and error message saying
"__vector_pair requires -mmma option".
And since these types are used mostly by the builtins, they will error
out if a feature is not enabled, but not ICE as seen in PR103343.

Please let me know your thoughts on this.

Thanks and regards,
Avinash Jayakar

> Peter
> 
> 
> 
> bergner@cfarm120:~$ cat vec_quad.c
> void
> food (__vector_quad *dst, __vector_quad *src)
> {
>   *dst = *src;
> }
> bergner@cfarm120:~$ gcc -O2 -mcpu=power10 -S vec_quad.c
> bergner@cfarm120:~$ cat vec_quad.s
>       .file   "vec_quad.c"
>       .machine power10
>       .abiversion 2
>       .section        ".text"
>       .align 2
>       .p2align 4,,15
>       .globl food
>       .type   food, @function
> food:
> .LFB0:
>       .cfi_startproc
>       .localentry     food,1
>       lxvp 2,0(4)
>       lxvp 0,32(4)
>       xxmtacc 0
>       xxmfacc 0
>       stxvp 2,0(3)
>       stxvp 0,32(3)
>       blr
>       .long 0
>       .byte 0,0,0,0,0,0,0,0
>       .cfi_endproc
> .LFE0:
>       .size   food,.-food
>       .ident  "GCC: (GNU) 11.5.0 20240719 (Red Hat 11.5.0-11)"
>       .section        .note.GNU-stack,"",@progbits

Reply via email to