Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Jeff Law Wed, 01 Oct 2025 05:54:59 -0700



On 9/25/25 4:05 AM, Paul-Antoine Arras wrote:

Hi Jeff,

On 23/09/2025 22:39, Jeff Law wrote:
On 9/23/25 1:45 PM, Paul-Antoine Arras wrote:
I experimented with this patch which allows to remove a vfmv when afloating-point op can be loaded directly from memory with a zero-stride vlse.
In terms of benchmarks, I measured the following reductions in icount:
* 503.bwaves: -4.0%
* 538.imagick: -3.3%
* 549.fotonik3d: -0.34%
However, the icount for 507.cactuBSSN increased by 0.43%. Inaddition, measurements on the BPI board show that the patch actuallyincreases execution times by 5 to 11%.
This may still be beneficial for some uarchs but would have to betunable, wouldn't it?
Is worth proceeding with this?
It's probably worth investigating. DO you happen to have A/B binarieshandy still? I could throw them onto our design.
Yes, you'll find attached the two binaries I built and tested on the BPI.

I built A/B binaries for bwaves and just ran input #1 on design. Theresults roughly math yours. About a 5% regression in performance with a5% improvement in icount.

We do have recognition of the zero stride load idiom in our design andit works for integer sources. The fact that FP performs so poorly isquite a surprise. Though this top line behavior does match what we'reseeing on the BPI as well.

I'm getting some data with perf record to see if there's perhapssomething goofy going on that can be easily spotted. What doesn't makemuch sense here is our LSU shouldn't really care about the underlyingdata types.


Jeff

Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Reply via email to