On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley <a...@openjdk.org> wrote:

> This intrinsic is generally faster than the current implementation for Panama 
> segment operations for all writes larger than about 8 bytes in size, 
> increasing to more than 2* the performance on larger memory blocks on 
> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" 
> (this intrinsic).
> 
> 
> Benchmark                       (aligned)  (size)  Mode  Cnt     Score    
> Error  Units
> MemorySegmentFillUnsafe.panama       true  262143  avgt   10  7295.638 ±  
> 0.422  ns/op
> MemorySegmentFillUnsafe.panama      false  262143  avgt   10  8345.300 ± 
> 80.161  ns/op
> MemorySegmentFillUnsafe.unsafe       true  262143  avgt   10  2930.594 ±  
> 0.180  ns/op
> MemorySegmentFillUnsafe.unsafe      false  262143  avgt   10  3136.828 ±  
> 0.232  ns/op

Graviton 4:


Benchmark                              (ELEM_SIZE)  Mode  Cnt        Score      
Error  Units
SegmentBulkFill.heapSegmentFillJava              2  avgt   10        2.324 ±    
0.066  ns/op
SegmentBulkFill.heapSegmentFillJava              3  avgt   10        2.427 ±    
0.031  ns/op
SegmentBulkFill.heapSegmentFillJava              4  avgt   10        2.231 ±    
0.009  ns/op
SegmentBulkFill.heapSegmentFillJava              5  avgt   10        2.523 ±    
0.040  ns/op
SegmentBulkFill.heapSegmentFillJava              6  avgt   10        2.632 ±    
0.017  ns/op
SegmentBulkFill.heapSegmentFillJava              7  avgt   10        2.394 ±    
0.007  ns/op
SegmentBulkFill.heapSegmentFillJava              8  avgt   10        3.004 ±    
0.032  ns/op
SegmentBulkFill.heapSegmentFillJava             64  avgt   10        4.813 ±    
0.417  ns/op
SegmentBulkFill.heapSegmentFillJava            512  avgt   10        9.151 ±    
0.040  ns/op
SegmentBulkFill.heapSegmentFillJava           4096  avgt   10       60.127 ±    
0.078  ns/op
SegmentBulkFill.heapSegmentFillJava          32768  avgt   10      461.292 ±    
2.127  ns/op
SegmentBulkFill.heapSegmentFillJava         262144  avgt   10     3666.851 ±    
0.280  ns/op
SegmentBulkFill.heapSegmentFillJava        2097152  avgt   10    35169.510 ±   
22.507  ns/op
SegmentBulkFill.heapSegmentFillJava       16777216  avgt   10   227182.710 ±  
903.546  ns/op
SegmentBulkFill.heapSegmentFillJava      134217728  avgt   10  1946761.410 ± 
3033.447  ns/op
SegmentBulkFill.heapSegmentFillLoop              2  avgt   10        2.902 ±    
0.038  ns/op
SegmentBulkFill.heapSegmentFillLoop              3  avgt   10        3.870 ±    
0.004  ns/op
SegmentBulkFill.heapSegmentFillLoop              4  avgt   10        5.438 ±    
0.013  ns/op
SegmentBulkFill.heapSegmentFillLoop              5  avgt   10        5.714 ±    
0.033  ns/op
SegmentBulkFill.heapSegmentFillLoop              6  avgt   10        5.748 ±    
0.019  ns/op
SegmentBulkFill.heapSegmentFillLoop              7  avgt   10        5.909 ±    
0.004  ns/op
SegmentBulkFill.heapSegmentFillLoop              8  avgt   10        6.330 ±    
0.295  ns/op
SegmentBulkFill.heapSegmentFillLoop             64  avgt   10        8.769 ±    
0.003  ns/op
SegmentBulkFill.heapSegmentFillLoop            512  avgt   10       16.935 ±    
0.007  ns/op
SegmentBulkFill.heapSegmentFillLoop           4096  avgt   10       57.822 ±    
0.510  ns/op
SegmentBulkFill.heapSegmentFillLoop          32768  avgt   10      376.849 ±    
0.311  ns/op
SegmentBulkFill.heapSegmentFillLoop         262144  avgt   10     3059.064 ±    
0.419  ns/op
SegmentBulkFill.heapSegmentFillLoop        2097152  avgt   10    24398.571 ±    
8.618  ns/op
SegmentBulkFill.heapSegmentFillLoop       16777216  avgt   10   225721.136 ±  
608.041  ns/op
SegmentBulkFill.heapSegmentFillLoop      134217728  avgt   10  1940987.569 ± 
2156.239  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            2  avgt   10        3.628 ±    
0.022  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            3  avgt   10        3.670 ±    
0.011  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            4  avgt   10        3.583 ±    
0.002  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            5  avgt   10        3.651 ±    
0.016  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            6  avgt   10        3.659 ±    
0.015  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            7  avgt   10        3.687 ±    
0.016  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            8  avgt   10        3.193 ±    
0.022  ns/op
SegmentBulkFill.heapSegmentFillUnsafe           64  avgt   10        3.365 ±    
0.034  ns/op
SegmentBulkFill.heapSegmentFillUnsafe          512  avgt   10        6.443 ±    
0.006  ns/op
SegmentBulkFill.heapSegmentFillUnsafe         4096  avgt   10       48.261 ±    
0.081  ns/op
SegmentBulkFill.heapSegmentFillUnsafe        32768  avgt   10      389.793 ±    
0.777  ns/op
SegmentBulkFill.heapSegmentFillUnsafe       262144  avgt   10     3123.758 ±    
1.048  ns/op
SegmentBulkFill.heapSegmentFillUnsafe      2097152  avgt   10    25039.904 ±   
55.467  ns/op
SegmentBulkFill.heapSegmentFillUnsafe     16777216  avgt   10   223579.037 ±  
306.005  ns/op
SegmentBulkFill.heapSegmentFillUnsafe    134217728  avgt   10  1931370.983 ± 
1110.364  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2867002071

Reply via email to