On 9/25/25 4:05 AM, Paul-Antoine Arras wrote:
Hi Jeff,
On 23/09/2025 22:39, Jeff Law wrote:
On 9/23/25 1:45 PM, Paul-Antoine Arras wrote:
I experimented with this patch which allows to remove a vfmv when a
floating-point op can be loaded directly from memory with a zero-
stride vlse.
In terms of benchmarks, I measured the following reductions in icount:
* 503.bwaves: -4.0%
* 538.imagick: -3.3%
* 549.fotonik3d: -0.34%
However, the icount for 507.cactuBSSN increased by 0.43%. In
addition, measurements on the BPI board show that the patch actually
increases execution times by 5 to 11%.
This may still be beneficial for some uarchs but would have to be
tunable, wouldn't it?
Is worth proceeding with this?
It's probably worth investigating. DO you happen to have A/B binaries
handy still? I could throw them onto our design.
Yes, you'll find attached the two binaries I built and tested on the BPI.
No clue why, but I'm getting faults running those binaries on our
design. The instruction pointer seems to go off into never-never land.
Given the binaries are statically linked, that's exceptionally weird.
Regardless, seems easier if I just build things here.
jeff