On Thu, 22 May 2025 07:34:08 GMT, Per Minborg <pminb...@openjdk.org> wrote:

> This PR builds on a concept John Rose told me about some time ago. Instead of 
> combining memory operations of various sizes, a single large and skewed 
> memory operation can be made to clean up the tail of remaining bytes.
> 
> This has the effect of simplifying and shortening the code while improving 
> performance. The number of branches to evaluate is reduced.

Performance on an M1 Mac (Sequoia 15.4.1)

Base:


Benchmark                                   (ELEM_SIZE)  Mode  Cnt        Score 
       Error  Units
SegmentBulkFill.nativeSegmentFillJava                 2  avgt   30        1.618 
±      0.060  ns/op
SegmentBulkFill.nativeSegmentFillJava                 3  avgt   30        1.602 
±      0.042  ns/op
SegmentBulkFill.nativeSegmentFillJava                 4  avgt   30        1.775 
±      0.070  ns/op
SegmentBulkFill.nativeSegmentFillJava                 5  avgt   30        1.759 
±      0.051  ns/op
SegmentBulkFill.nativeSegmentFillJava                 6  avgt   30        1.771 
±      0.051  ns/op
SegmentBulkFill.nativeSegmentFillJava                 7  avgt   30        1.785 
±      0.049  ns/op
SegmentBulkFill.nativeSegmentFillJava                 8  avgt   30        2.383 
±      0.061  ns/op
SegmentBulkFill.nativeSegmentFillJava                64  avgt   30        4.010 
±      0.255  ns/op
SegmentBulkFill.nativeSegmentFillJava               512  avgt   30        6.622 
±      0.246  ns/op
SegmentBulkFill.nativeSegmentFillJava              4096  avgt   30       44.431 
±      0.832  ns/op
SegmentBulkFill.nativeSegmentFillJava             32768  avgt   30      331.429 
±      3.073  ns/op
SegmentBulkFill.nativeSegmentFillJava            262144  avgt   30     4174.795 
±     76.096  ns/op
SegmentBulkFill.nativeSegmentFillJava           2097152  avgt   30    33084.699 
±     53.530  ns/op
SegmentBulkFill.nativeSegmentFillJava          16777216  avgt   30   298953.004 
±  11241.262  ns/op
SegmentBulkFill.nativeSegmentFillJava         134217728  avgt   30  2857973.939 
± 128453.291  ns/op


Patch

Benchmark                              (ELEM_SIZE)  Mode  Cnt        Score      
  Error  Units
SegmentBulkFill.nativeSegmentFillJava            2  avgt   30        1.317 ±    
  0.022  ns/op
SegmentBulkFill.nativeSegmentFillJava            3  avgt   30        1.313 ±    
  0.006  ns/op
SegmentBulkFill.nativeSegmentFillJava            4  avgt   30        1.319 ±    
  0.018  ns/op
SegmentBulkFill.nativeSegmentFillJava            5  avgt   30        1.317 ±    
  0.019  ns/op
SegmentBulkFill.nativeSegmentFillJava            6  avgt   30        1.316 ±    
  0.016  ns/op
SegmentBulkFill.nativeSegmentFillJava            7  avgt   30        1.320 ±    
  0.019  ns/op
SegmentBulkFill.nativeSegmentFillJava            8  avgt   30        2.239 ±    
  0.047  ns/op
SegmentBulkFill.nativeSegmentFillJava           64  avgt   30        3.487 ±    
  0.074  ns/op
SegmentBulkFill.nativeSegmentFillJava          512  avgt   30        6.659 ±    
  0.102  ns/op
SegmentBulkFill.nativeSegmentFillJava         4096  avgt   30       44.461 ±    
  0.666  ns/op
SegmentBulkFill.nativeSegmentFillJava        32768  avgt   30      331.159 ±    
  5.928  ns/op
SegmentBulkFill.nativeSegmentFillJava       262144  avgt   30     4171.649 ±    
 60.867  ns/op
SegmentBulkFill.nativeSegmentFillJava      2097152  avgt   30    34718.817 ±    
697.494  ns/op
SegmentBulkFill.nativeSegmentFillJava     16777216  avgt   30   305446.597 ±  
11087.702  ns/op
SegmentBulkFill.nativeSegmentFillJava    134217728  avgt   30  2905051.303 ± 
114905.125  ns/op


![image](https://github.com/user-attachments/assets/df4888ab-67d9-49fe-982b-8018d949cee3)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25383#issuecomment-2900213674

Reply via email to