Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Jeff Law Sat, 18 Oct 2025 15:09:12 -0700



On 9/23/25 1:45 PM, Paul-Antoine Arras wrote:

I experimented with this patch which allows to remove a vfmv when afloating-point op can be loaded directly from memory with a zero-stridevlse.
In terms of benchmarks, I measured the following reductions in icount:
* 503.bwaves: -4.0%
* 538.imagick: -3.3%
* 549.fotonik3d: -0.34%
However, the icount for 507.cactuBSSN increased by 0.43%. In addition,measurements on the BPI board show that the patch actually increasesexecution times by 5 to 11%.
This may still be beneficial for some uarchs but would have to betunable, wouldn't it?
Is worth proceeding with this?

It's probably worth investigating. DO you happen to have A/B binarieshandy still? I could throw them onto our design.

Austin and I tested the BPI for the zero-strided load idiom, but just onthe integer side and it looked like it likely supported optimizing thoseinto a single load + an internal broadcast across the vector. So it's abit of a surprise to see it not performing well at all for FP.

Note there is an entry in the riscv_tune_param structure controlling thezero-stride idiom. So you could test that quite easily and assuming theport had things defined properly it would just work.


Jeff

Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Reply via email to