On Fri, 17 Jan 2025 14:58:37 GMT, Jorn Vernee <[email protected]> wrote:
> Could you add the benchmark you're using to the PR as well?
Done. I slotted it into the "points" BM suite, alas I had to define another
"DoublePoint" struct, though, since the existing int/int pair gets packed into
a long.
Full disclosure, I'm not sure how to run it inside the jdk build structure, ran
it outside instead, so I hope it builds (`make test
TEST="micro:java.lang.foreign.points"` => `Error: Unable to access jarfile
/Users/mernst/IdeaProjects/jdk/build/macosx-aarch64-server-fastdebug/images/test/micro/benchmarks.jar`)
It exercises a loop like this:
struct DoublePoint { double x; double y; }
DoublePoint unit_rotate(double phi); <== HFA requires intermediate buffer
void unit_rotate_ptr(DoublePoint* out, double phi); <== reference, no
intermediate buffer
DoublePoint *points = new DoublePoint[N];
for (i in 0...N) points[i] = unit_rotate(2*pi*i/N);
vs
for (i in 0...N) unit_rotate_ptr(points+i, 2*pi*i/N);
It is now almost competitive and the memory profile looks a lot better:
# VM version: JDK 25-ea, OpenJDK 64-Bit Server VM, 25-ea+3-283
Benchmark Mode Cnt Score
Error Units
PointsAlloc.circle_by_ptr avgt 5 8.964 ± 0.351
ns/op
PointsAlloc.circle_by_ptr:·gc.alloc.rate avgt 5 95.301 ± 3.665
MB/sec
PointsAlloc.circle_by_ptr:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001
B/op
PointsAlloc.circle_by_ptr:·gc.count avgt 5 2.000
counts
PointsAlloc.circle_by_ptr:·gc.time avgt 5 3.000
ms
PointsAlloc.circle_by_value avgt 5 46.498 ± 2.336
ns/op
PointsAlloc.circle_by_value:·gc.alloc.rate avgt 5 13141.578 ± 650.425
MB/sec
PointsAlloc.circle_by_value:·gc.alloc.rate.norm avgt 5 160.224 ± 0.001
B/op
PointsAlloc.circle_by_value:·gc.count avgt 5 116.000
counts
PointsAlloc.circle_by_value:·gc.time avgt 5 44.000
ms
# VM version: JDK 25-internal, OpenJDK 64-Bit Server VM,
25-internal-adhoc.mernst.jdk
Benchmark Mode Cnt Score Error
Units
PointsAlloc.circle_by_ptr avgt 5 9.108 ± 0.477
ns/op
PointsAlloc.circle_by_ptr:·gc.alloc.rate avgt 5 93.792 ± 4.898
MB/sec
PointsAlloc.circle_by_ptr:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001
B/op
PointsAlloc.circle_by_ptr:·gc.count avgt 5 2.000
counts
PointsAlloc.circle_by_ptr:·gc.time avgt 5 4.000
ms
PointsAlloc.circle_by_value avgt 5 13.180 ± 0.611
ns/op
PointsAlloc.circle_by_value:·gc.alloc.rate avgt 5 64.816 ± 2.964
MB/sec
PointsAlloc.circle_by_value:·gc.alloc.rate.norm avgt 5 0.224 ± 0.001
B/op
PointsAlloc.circle_by_value:·gc.count avgt 5 2.000
counts
PointsAlloc.circle_by_value:·gc.time avgt 5 5.000
ms
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23142#issuecomment-2599586149