Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Robin Dapp Wed, 24 Sep 2025 02:21:28 -0700

> I experimented with this patch which allows to remove a vfmv when a 
> floating-point op can be loaded directly from memory with a zero-stride 
> vlse.
>
> In terms of benchmarks, I measured the following reductions in icount:
> * 503.bwaves: -4.0%
> * 538.imagick: -3.3%
> * 549.fotonik3d: -0.34%
>
> However, the icount for 507.cactuBSSN increased by 0.43%. In addition, 
> measurements on the BPI board show that the patch actually increases 
> execution times by 5 to 11%.
>
> This may still be beneficial for some uarchs but would have to be 
> tunable, wouldn't it?
> Is worth proceeding with this?


As we discussed before, icount can be treacherous, in particular with "clever" 
patterns like these.  And that's the reason why we made the zero-strided-load 
idiom tunable or didn't try to use it everywhere.  Such a big negative swing 
for real performance is still surprising and my gut feeling would be that we
stop hoisting something out of a loop.

I kind of agree that the unconditional mem handling contradicts the design 
goals.  It was the most straightforward way, though, to only use strided 
broadcasts where "absolutely" necessary.

I guess an argument can be made to have mem operands "strided broadcastable" 
instead of broadcastable but then of course for both, integer and float.
Consequently, the !strided_load_broadcast fallback would need to be adjusted to 
not only cover HFmode but all modes.

Also, vv -> vx and strided broadcast oppose each other to some degree.  If we 
keep the mem (which helps IRA) until late we cannot propagate, if we split 
early we don't go back to a vlse and so on.  That's all manageable but requires 
a bit of balancing and I'm not sure how useful it is from a performance 
perspective.  My mental model is that for most uarchs strided load broadcast is 
at best a nop performance wise and at worst a degradation.  Andrew mentioned 
there are some that heavily favor the strided form but we'd need silicon to 
actually test that.

-- 
Regards
 Robin

Re: [PATCH][RFC] RISC-V: Allow FP strided broadcast from memory [PR121451]

Reply via email to