> -    riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (<MODE>mode),
> -                                   riscv_vector::UNARY_OP, operands);
> +    /* We cannot do anything with a Float16 mode apart from converting.
> +       So convert to float, broadcast and truncate.  */
> +    if (TARGET_ZVFHMIN && !TARGET_ZVFH && <VEL>mode == HFmode)
> +      {
> + rtx tmpsf = gen_reg_rtx (SFmode);
> + emit_insn (gen_extendhfsf2 (tmpsf, operands[1]));
> + poly_uint64 nunits = GET_MODE_NUNITS (<MODE>mode);
> + machine_mode vmodesf
> +  = riscv_vector::get_vector_mode (SFmode, nunits).require ();
> + rtx tmp = gen_reg_rtx (vmodesf);
> + rtx ops[] =  {tmp, tmpsf};
> + riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (vmodesf),
> +       riscv_vector::UNARY_OP, ops);
> + rtx ops2[] = {operands[0], tmp};
> + riscv_vector::emit_vlmax_insn (code_for_pred_trunc (vmodesf),
> +       riscv_vector::UNARY_OP_FRM_DYN, ops2);

I disagree with this part especially the comment, vlse for HF vector
just a 16 bits load, and load does not really care about the data
format but size.
Also we can put HF in GPR rather than FPR for those splat/broadcast
patterns in theory.

> +      }
> +    else
> +      riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (<MODE>mode),
> +     riscv_vector::UNARY_OP, operands);

On Tue, Jun 24, 2025 at 8:47 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
> This is primarily work from Robin and Shreya.  My contribution is just
> mentoring for Shreya and writing the ChangeLog.  Shreya is busy on a
> code generation issue and I expect both new entries in the tuning
> structure as well as new instances of the tuning structure in the works
> (spacemit x60) coming relatively soon.
>
> Late breaking news is that we are going to need to add some additional
> alignment checks to this code.  That's a preexisting issue and after
> some discussions with Robin and a bit of pondering on my side I've
> decided to go forward with this change now.   Robin is already looking
> at alignment issues WRT strided, indexed and presumably segmented memory
> references and will cover the issue as part of that work.
>
> --
>
> So the basic idea here is to give uarchs the ability to enable/disable
> using the zero strided load idiom to broadcast a single memory element
> across a vector.  While long term I would expect most if not all designs
> to support this efficiently, I could easily some vector designs not
> having implementations of this optimization in the short term.
>
> It didn't seem worth the effort to have a --param here.  If folks think
> it's really needed, it certainly could be added.   Though I suspect it's
> primarily a test, set and forget process for each design with vector
> support.
>
> This has been in my tester a few days.  Waiting for pre-commit CI to do
> its thing.
>
> jeff

Reply via email to