On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley <a...@openjdk.org> wrote:
> This intrinsic is generally faster than the current implementation for Panama > segment operations for all writes larger than about 8 bytes in size, > increasing to more than 2* the performance on larger memory blocks on > Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" > (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score > Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ± > 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ± > 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ± > 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ± > 0.232 ns/op Graviton 4: Benchmark (ELEM_SIZE) Mode Cnt Score Error Units SegmentBulkFill.heapSegmentFillJava 2 avgt 10 2.324 ± 0.066 ns/op SegmentBulkFill.heapSegmentFillJava 3 avgt 10 2.427 ± 0.031 ns/op SegmentBulkFill.heapSegmentFillJava 4 avgt 10 2.231 ± 0.009 ns/op SegmentBulkFill.heapSegmentFillJava 5 avgt 10 2.523 ± 0.040 ns/op SegmentBulkFill.heapSegmentFillJava 6 avgt 10 2.632 ± 0.017 ns/op SegmentBulkFill.heapSegmentFillJava 7 avgt 10 2.394 ± 0.007 ns/op SegmentBulkFill.heapSegmentFillJava 8 avgt 10 3.004 ± 0.032 ns/op SegmentBulkFill.heapSegmentFillJava 64 avgt 10 4.813 ± 0.417 ns/op SegmentBulkFill.heapSegmentFillJava 512 avgt 10 9.151 ± 0.040 ns/op SegmentBulkFill.heapSegmentFillJava 4096 avgt 10 60.127 ± 0.078 ns/op SegmentBulkFill.heapSegmentFillJava 32768 avgt 10 461.292 ± 2.127 ns/op SegmentBulkFill.heapSegmentFillJava 262144 avgt 10 3666.851 ± 0.280 ns/op SegmentBulkFill.heapSegmentFillJava 2097152 avgt 10 35169.510 ± 22.507 ns/op SegmentBulkFill.heapSegmentFillJava 16777216 avgt 10 227182.710 ± 903.546 ns/op SegmentBulkFill.heapSegmentFillJava 134217728 avgt 10 1946761.410 ± 3033.447 ns/op SegmentBulkFill.heapSegmentFillLoop 2 avgt 10 2.902 ± 0.038 ns/op SegmentBulkFill.heapSegmentFillLoop 3 avgt 10 3.870 ± 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 4 avgt 10 5.438 ± 0.013 ns/op SegmentBulkFill.heapSegmentFillLoop 5 avgt 10 5.714 ± 0.033 ns/op SegmentBulkFill.heapSegmentFillLoop 6 avgt 10 5.748 ± 0.019 ns/op SegmentBulkFill.heapSegmentFillLoop 7 avgt 10 5.909 ± 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 8 avgt 10 6.330 ± 0.295 ns/op SegmentBulkFill.heapSegmentFillLoop 64 avgt 10 8.769 ± 0.003 ns/op SegmentBulkFill.heapSegmentFillLoop 512 avgt 10 16.935 ± 0.007 ns/op SegmentBulkFill.heapSegmentFillLoop 4096 avgt 10 57.822 ± 0.510 ns/op SegmentBulkFill.heapSegmentFillLoop 32768 avgt 10 376.849 ± 0.311 ns/op SegmentBulkFill.heapSegmentFillLoop 262144 avgt 10 3059.064 ± 0.419 ns/op SegmentBulkFill.heapSegmentFillLoop 2097152 avgt 10 24398.571 ± 8.618 ns/op SegmentBulkFill.heapSegmentFillLoop 16777216 avgt 10 225721.136 ± 608.041 ns/op SegmentBulkFill.heapSegmentFillLoop 134217728 avgt 10 1940987.569 ± 2156.239 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2 avgt 10 3.628 ± 0.022 ns/op SegmentBulkFill.heapSegmentFillUnsafe 3 avgt 10 3.670 ± 0.011 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4 avgt 10 3.583 ± 0.002 ns/op SegmentBulkFill.heapSegmentFillUnsafe 5 avgt 10 3.651 ± 0.016 ns/op SegmentBulkFill.heapSegmentFillUnsafe 6 avgt 10 3.659 ± 0.015 ns/op SegmentBulkFill.heapSegmentFillUnsafe 7 avgt 10 3.687 ± 0.016 ns/op SegmentBulkFill.heapSegmentFillUnsafe 8 avgt 10 3.193 ± 0.022 ns/op SegmentBulkFill.heapSegmentFillUnsafe 64 avgt 10 3.365 ± 0.034 ns/op SegmentBulkFill.heapSegmentFillUnsafe 512 avgt 10 6.443 ± 0.006 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4096 avgt 10 48.261 ± 0.081 ns/op SegmentBulkFill.heapSegmentFillUnsafe 32768 avgt 10 389.793 ± 0.777 ns/op SegmentBulkFill.heapSegmentFillUnsafe 262144 avgt 10 3123.758 ± 1.048 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2097152 avgt 10 25039.904 ± 55.467 ns/op SegmentBulkFill.heapSegmentFillUnsafe 16777216 avgt 10 223579.037 ± 306.005 ns/op SegmentBulkFill.heapSegmentFillUnsafe 134217728 avgt 10 1931370.983 ± 1110.364 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2867002071