Re: [to-be-committed][RISC-V][PR target/118734] Make using zero-strided loads a uarch tunable

Kito Cheng Mon, 23 Jun 2025 19:22:10 -0700

> >> -    riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (<MODE>mode),
> >> -                                   riscv_vector::UNARY_OP, operands);
> >> +    /* We cannot do anything with a Float16 mode apart from converting.
> >> +       So convert to float, broadcast and truncate.  */
> >> +    if (TARGET_ZVFHMIN && !TARGET_ZVFH && <VEL>mode == HFmode)
> >> +      {
> >> + rtx tmpsf = gen_reg_rtx (SFmode);
> >> + emit_insn (gen_extendhfsf2 (tmpsf, operands[1]));
> >> + poly_uint64 nunits = GET_MODE_NUNITS (<MODE>mode);
> >> + machine_mode vmodesf
> >> +  = riscv_vector::get_vector_mode (SFmode, nunits).require ();
> >> + rtx tmp = gen_reg_rtx (vmodesf);
> >> + rtx ops[] =  {tmp, tmpsf};
> >> + riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (vmodesf),
> >> +       riscv_vector::UNARY_OP, ops);
> >> + rtx ops2[] = {operands[0], tmp};
> >> + riscv_vector::emit_vlmax_insn (code_for_pred_trunc (vmodesf),
> >> +       riscv_vector::UNARY_OP_FRM_DYN, ops2);
> >
> > I disagree with this part especially the comment, vlse for HF vector
> > just a 16 bits load, and load does not really care about the data
> > format but size.
> Hmm, we certainly can do a bit more.  Don't have have fmacs defined on
> HF and/or BF types through one of those obscure HF/BF extensions?


Yeah we don't have fmacs for HF and BF if we have ZVFHMIN only, but
this part is deoptimization,

The code gen path was:
If the value in memory -> vlse
If the value in either GPR or FPR -> spill to stack -> vlse

And now:
If the value in memory -> load to FPR -> extendhfsf -> vmv.f.v
(broadcast) -> vfncvt.vv (trunc)
If the value in FPR -> extendhfsf -> vmv.f.v (broadcast) -> vfncvt.vv (trunc)

Use pr115763-2.c as example:

; w/o this patch, one vec load
fsh fa0,14(sp)
addi a5,sp,14
vsetivli zero,2,e16,mf4,ta,ma
vlse16.v v1,0(a5),zero

vs

; w/ this patch, two vector instruction
fcvt.s.h        fa0,fa0
vsetivli        zero,2,e32,mf2,ta,ma
vfmv.v.f        v1,fa0
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.w    v1,v1


> > Also we can put HF in GPR rather than FPR for those splat/broadcast
> > patterns in theory.
> In theory, yes.  BUt I don't think any of the patterns in the backend
> have constraints that would allow a GPR to hold a BF16 value.

We have those pattern to allow GPR to hold BF16 and F16 value,
and riscv_hard_regno_mode_ok didn't limit GPR can't hold those modes as well:

(define_insn "*mov<mode>_hardfloat"
 [(set (match_operand:HFBF 0 "nonimmediate_operand" "=f,
f,f,f,m,m,*f,*r,  *r,*r,*m")
       (match_operand:HFBF 1 "move_operand"         "
f,zfli,G,m,f,G,*r,*f,*G*r,*m,*r"))]
 "((TARGET_ZFHMIN && <MODE>mode == HFmode)
   || (TARGET_ZFBFMIN && <MODE>mode == BFmode))
  && (register_operand (operands[0], <MODE>mode)
      || reg_or_0_operand (operands[1], <MODE>mode))"
 { return riscv_output_move (operands[0], operands[1]); }
 [(set_attr "move_type"
"fmove,fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
  (set_attr "type"
"fmove,fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
  (set_attr "mode" "<MODE>")])

(define_insn "*mov<mode>_softfloat"
 [(set (match_operand:HFBF 0 "nonimmediate_operand" "=f, r,r,m,*f,*r")
       (match_operand:HFBF 1 "move_operand"         " f,Gr,m,r,*r,*f"))]
 "((!TARGET_ZFHMIN && <MODE>mode == HFmode) || (<MODE>mode == BFmode))
  && (register_operand (operands[0], <MODE>mode)
      || reg_or_0_operand (operands[1], <MODE>mode))"
 { return riscv_output_move (operands[0], operands[1]); }
 [(set_attr "move_type" "fmove,move,load,store,mtc,mfc")
  (set_attr "type" "fmove,move,load,store,mtc,mfc")
  (set_attr "mode" "<MODE>")])


>
>
> Given the objections, clearly this shouldn't be committed until those
> are resolved.

The objection from me is removing "*pred_broadcast<mode>_zvfhmin" and
those HF/BF16 changes, I propose that part should separate into
another patch since this part does not appear in the title and the git
comment.

>
> Jeff
>

Re: [to-be-committed][RISC-V][PR target/118734] Make using zero-strided loads a uarch tunable

Reply via email to