On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley <a...@openjdk.org> wrote:

> This intrinsic is generally faster than the current implementation for Panama 
> segment operations for all writes larger than about 8 bytes in size, 
> increasing to more than 2* the performance on larger memory blocks on 
> Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" 
> (this intrinsic).
> 
> 
> Benchmark                       (aligned)  (size)  Mode  Cnt     Score    
> Error  Units
> MemorySegmentFillUnsafe.panama       true  262143  avgt   10  7295.638 ±  
> 0.422  ns/op
> MemorySegmentFillUnsafe.panama      false  262143  avgt   10  8345.300 ± 
> 80.161  ns/op
> MemorySegmentFillUnsafe.unsafe       true  262143  avgt   10  2930.594 ±  
> 0.180  ns/op
> MemorySegmentFillUnsafe.unsafe      false  262143  avgt   10  3136.828 ±  
> 0.232  ns/op

Apple M1:


Benchmark                              (ELEM_SIZE)  Mode  Cnt        Score      
 Error  Units
SegmentBulkFill.heapSegmentFillJava              2  avgt   10        1.727 ±    
 0.017  ns/op
SegmentBulkFill.heapSegmentFillJava              3  avgt   10        1.721 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillJava              4  avgt   10        1.876 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillJava              5  avgt   10        1.876 ±    
 0.001  ns/op
SegmentBulkFill.heapSegmentFillJava              6  avgt   10        1.876 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillJava              7  avgt   10        1.876 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillJava              8  avgt   10        2.502 ±    
 0.003  ns/op
SegmentBulkFill.heapSegmentFillJava             64  avgt   10        4.064 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillJava            512  avgt   10        6.601 ±    
 0.051  ns/op
SegmentBulkFill.heapSegmentFillJava           4096  avgt   10       44.050 ±    
 0.076  ns/op
SegmentBulkFill.heapSegmentFillJava          32768  avgt   10      330.328 ±    
 0.450  ns/op
SegmentBulkFill.heapSegmentFillJava         262144  avgt   10     4138.154 ±    
 6.509  ns/op
SegmentBulkFill.heapSegmentFillJava        2097152  avgt   10    33089.966 ±    
48.068  ns/op
SegmentBulkFill.heapSegmentFillJava       16777216  avgt   10   352669.548 ±   
571.433  ns/op
SegmentBulkFill.heapSegmentFillJava      134217728  avgt   10  4482510.192 ±  
7177.637  ns/op
SegmentBulkFill.heapSegmentFillLoop              2  avgt   10        1.977 ±    
 0.003  ns/op
SegmentBulkFill.heapSegmentFillLoop              3  avgt   10        3.447 ±    
 0.002  ns/op
SegmentBulkFill.heapSegmentFillLoop              4  avgt   10        4.073 ±    
 0.042  ns/op
SegmentBulkFill.heapSegmentFillLoop              5  avgt   10        4.377 ±    
 0.004  ns/op
SegmentBulkFill.heapSegmentFillLoop              6  avgt   10        5.337 ±    
 0.071  ns/op
SegmentBulkFill.heapSegmentFillLoop              7  avgt   10        5.629 ±    
 0.004  ns/op
SegmentBulkFill.heapSegmentFillLoop              8  avgt   10        5.947 ±    
 0.010  ns/op
SegmentBulkFill.heapSegmentFillLoop             64  avgt   10        8.127 ±    
 0.003  ns/op
SegmentBulkFill.heapSegmentFillLoop            512  avgt   10       16.045 ±    
 0.027  ns/op
SegmentBulkFill.heapSegmentFillLoop           4096  avgt   10       46.627 ±    
 0.164  ns/op
SegmentBulkFill.heapSegmentFillLoop          32768  avgt   10      333.233 ±    
 1.040  ns/op
SegmentBulkFill.heapSegmentFillLoop         262144  avgt   10     4134.009 ±    
11.125  ns/op
SegmentBulkFill.heapSegmentFillLoop        2097152  avgt   10    33148.671 ±   
322.905  ns/op
SegmentBulkFill.heapSegmentFillLoop       16777216  avgt   10   343832.913 ±   
233.881  ns/op
SegmentBulkFill.heapSegmentFillLoop      134217728  avgt   10  4475821.911 ±  
6101.380  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            2  avgt   10        3.133 ±    
 0.034  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            3  avgt   10        3.130 ±    
 0.005  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            4  avgt   10        3.128 ±    
 0.004  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            5  avgt   10        3.139 ±    
 0.030  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            6  avgt   10        3.135 ±    
 0.035  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            7  avgt   10        3.135 ±    
 0.030  ns/op
SegmentBulkFill.heapSegmentFillUnsafe            8  avgt   10        2.665 ±    
 0.006  ns/op
SegmentBulkFill.heapSegmentFillUnsafe           64  avgt   10        2.841 ±    
 0.032  ns/op
SegmentBulkFill.heapSegmentFillUnsafe          512  avgt   10        6.246 ±    
 0.100  ns/op
SegmentBulkFill.heapSegmentFillUnsafe         4096  avgt   10       41.241 ±    
 0.107  ns/op
SegmentBulkFill.heapSegmentFillUnsafe        32768  avgt   10      331.001 ±    
 4.521  ns/op
SegmentBulkFill.heapSegmentFillUnsafe       262144  avgt   10     3038.808 ±    
29.750  ns/op
SegmentBulkFill.heapSegmentFillUnsafe      2097152  avgt   10    21996.375 ±  
2617.947  ns/op
SegmentBulkFill.heapSegmentFillUnsafe     16777216  avgt   10   241814.864 ± 
24300.854  ns/op
SegmentBulkFill.heapSegmentFillUnsafe    134217728  avgt   10  2811655.392 ± 
24737.911  ns/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866961810

Reply via email to