adriacabeza opened a new pull request, #2485:
URL: https://github.com/apache/fory/pull/2485

   ## What does this PR do?
   Currently, primitive array are serialized by copy the data buffer directly 
using `sun.misc.Unsafe`. After this PR, if the config values are specified, we 
check with SIMD operations if the values inside are small enough and then we 
compress its values. For example:
   
   ```
   int[] → byte[] when all values ∈ [-128, 127]
   Size goes from 4B/elem → 1B/elem (≈ 75% smaller).
   
   int[] → short[] when all values ∈ [-32768, 32767] (but some fall outside 
[-128,127])
   4B/elem → 2B/elem (≈ 50% smaller).
   
   long[] → int[] when all values ∈ [Integer.MIN_VALUE, Integer.MAX_VALUE]
   8B/elem → 4B/elem (≈ 50% smaller).
   ```
   
   Please take a look at the benchmark section to know more about the 
performance impact. 
   
   ## Related issues
   
   <!--
   Is there any related issue? Please attach here.
   
   - #xxxx0
   - #xxxx1
   - #xxxx2
   -->
   
   ## Does this PR introduce any user-facing change?
   
   Yes, it adds `withIntArrayCompressed` and `withLongArrayCompressed` options 
to the Fory builder which is a breaking change with previously compressed 
arrays. 
   
   - [x] Does this PR introduce any public API change?
   - [ ] Does this PR introduce any binary protocol compatibility change?
   
   ## Benchmark
   Check the file 
`src/main/java/org/apache/fory/benchmark/ArrayCompressionSuite.java` to see the 
benchmark code. To run it, go to the `/benchmark` directory and:
   1. Compile the benchmark
   ```
   javac -cp "../fory-core/target/classes:$(mvn dependency:build-classpath -q 
-Dmdep.outputFile=/dev/stdout -Pjmh)" -d target/classes 
src/main/java/org/apache/fory/benchmark/ArrayCompressionSuite.java
   ```
   2. Run benchmark
   The benchmarks performed in `ArrayCompressionSuite` show that:
   - Compression has negligible cost: Deserialization and serialization 
throughput are almost identical between compressed and uncompressed arrays.
   - Massive SIMD advantage: Determining compression type with SIMD is orders 
of magnitude faster than scalar.
   - Memory savings with compression (e.g., int[] → byte[] gives 75% smaller 
arrays) delivers significant space reductions with no meaningful performance 
trade-off.
   
   ```
   Benchmark                                          (arraySize)   Mode  Cnt   
       Score          Error  Units
   # deserialize
   ArrayCompressionSuite.deserializeCompressedIntArray        100  thrpt   3   
39337995.386 ±  5360760.131  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray            100  thrpt   3   
38295765.689 ±  1245305.149  ops/s
   
   ArrayCompressionSuite.deserializeCompressedIntArray       1000  thrpt   3    
5551302.526 ±   137012.772  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray           1000  thrpt   3    
5437271.349 ±   607905.314  ops/s
   
   ArrayCompressionSuite.deserializeCompressedIntArray      10000  thrpt   3    
 646854.523 ±   162819.346  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray          10000  thrpt   3    
 622439.409 ±   156114.438  ops/s
   
   ArrayCompressionSuite.deserializeCompressedIntArray     100000  thrpt   3    
  72575.767 ±     1516.885  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray         100000  thrpt   3    
  71451.258 ±     8428.635  ops/s
   
   ArrayCompressionSuite.deserializeCompressedIntArray    1000000  thrpt   3    
   7333.304 ±     5231.423  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray        1000000  thrpt   3    
   6763.321 ±     6402.086  ops/s
   
   ArrayCompressionSuite.deserializeCompressedIntArray   10000000  thrpt   3    
    617.388 ±       20.088  ops/s
   ArrayCompressionSuite.deserializeNormalIntArray       10000000  thrpt   3    
    615.323 ±        4.986  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray        100  thrpt   3   
25497542.958 ±   409763.148  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray            100  thrpt   3   
25509274.001 ±   709963.309  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray       1000  thrpt   3   
  955670.779 ±  1113862.949  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray           1000  thrpt   3   
  923666.161 ±  1873719.259  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray      10000  thrpt   3   
  358903.335 ±    73728.335  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray          10000  thrpt   3   
  356203.511 ±    60149.652  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray     100000  thrpt   3   
   36963.012 ±      316.502  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray         100000  thrpt   3   
   41915.010 ±    26587.942  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray    1000000  thrpt   3   
    3783.469 ±     1564.800  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray        1000000  thrpt   3   
    3834.763 ±      762.397  ops/s
   
   ArrayCompressionSuite.deserializeCompressedLongArray   10000000  thrpt   3   
     323.378 ±       35.506  ops/s
   ArrayCompressionSuite.deserializeNormalLongArray       10000000  thrpt   3   
     332.769 ±       20.246  ops/s
   ```
   
   ```
   Benchmark                                        (arraySize)   Mode  Cnt     
     Score          Error  Units
   # serialize compression
   ArrayCompressionSuite.serializeCompressedIntArray        100  thrpt   3   
37687667.146 ±   649284.080  ops/s
   ArrayCompressionSuite.serializeNormalIntArray            100  thrpt   3   
30217468.757 ± 10126179.430  ops/s
   
   ArrayCompressionSuite.serializeCompressedIntArray       1000  thrpt   3   
11754790.584 ±  6487913.872  ops/s
   ArrayCompressionSuite.serializeNormalIntArray           1000  thrpt   3   
11455357.054 ±  1020061.625  ops/s
   
   ArrayCompressionSuite.serializeCompressedIntArray      10000  thrpt   3    
1214061.700 ±   438135.395  ops/s
   ArrayCompressionSuite.serializeNormalIntArray          10000  thrpt   3    
1149324.112 ±    64148.031  ops/s
   
   ArrayCompressionSuite.serializeCompressedIntArray     100000  thrpt   3     
130141.077 ±    51758.369  ops/s
   ArrayCompressionSuite.serializeNormalIntArray         100000  thrpt   3     
125215.129 ±    21495.372  ops/s
   
   ArrayCompressionSuite.serializeCompressedIntArray    1000000  thrpt   3      
 8399.914 ±     1440.360  ops/s
   ArrayCompressionSuite.serializeNormalIntArray        1000000  thrpt   3      
10310.840 ±    18645.102  ops/s
   
   ArrayCompressionSuite.serializeCompressedIntArray   10000000  thrpt   3      
  753.918 ±       97.084  ops/s
   ArrayCompressionSuite.serializeNormalIntArray       10000000  thrpt   3      
  764.189 ±       79.020  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray        100  thrpt   3   
39784594.200 ±  2699436.390  ops/s
   ArrayCompressionSuite.serializeNormalLongArray            100  thrpt   3   
40270803.004 ±   864536.346  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray       1000  thrpt   3    
6477672.225 ±   686396.963  ops/s
   ArrayCompressionSuite.serializeNormalLongArray           1000  thrpt   3    
6545549.418 ±  1019871.296  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray      10000  thrpt   3     
617183.974 ±    55659.620  ops/s
   ArrayCompressionSuite.serializeNormalLongArray          10000  thrpt   3     
635196.999 ±   149026.050  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray     100000  thrpt   3     
 53462.482 ±    35638.133  ops/s
   ArrayCompressionSuite.serializeNormalLongArray         100000  thrpt   3     
 52004.432 ±    23252.382  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray    1000000  thrpt   3     
  6026.037 ±    16709.968  ops/s
   ArrayCompressionSuite.serializeNormalLongArray        1000000  thrpt   3     
  5552.551 ±      466.382  ops/s
   
   ArrayCompressionSuite.serializeCompressedLongArray   10000000  thrpt   3     
   429.205 ±      105.545  ops/s
   ArrayCompressionSuite.serializeNormalLongArray       10000000  thrpt   3     
   429.779 ±       45.038  ops/s
   ```
   
   ```
   Benchmark                                                    (arraySize)   
Mode  Cnt          Score          Error  Units
   # determine compression type
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD           100  
thrpt    3  466402307.389 ±  2555334.745  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar              100  
thrpt    3   26427884.309 ±   962450.255  ops/s
   
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD          1000  
thrpt    3  466744123.120 ± 12228234.474  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar             1000  
thrpt    3    2757905.535 ±     6773.933  ops/s
   
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD         10000  
thrpt    3  467446373.509 ±  2354677.669  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar            10000  
thrpt    3     281539.120 ±    32260.205  ops/s
   
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD        100000  
thrpt    3  467496144.240 ±  5141955.783  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar           100000  
thrpt    3      28332.792 ±     1172.081  ops/s
   
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD       1000000  
thrpt    3  457251328.679 ± 67039627.791  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar          1000000  
thrpt    3       2814.572 ±       83.414  ops/s
   
   ArrayCompressionSuite.determineIntArrayCompressionTypeSIMD      10000000  
thrpt    3  465654341.115 ± 15778486.010  ops/s
   ArrayCompressionSuite.determineIntCompressionTypeScalar         10000000  
thrpt    3        280.622 ±        4.857  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD          100  
thrpt    3  473362303.009 ± 19310541.495  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar             100  
thrpt    3   28186064.314 ±   504616.686  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD         1000  
thrpt    3  467564131.215 ± 76322855.052  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar            1000  
thrpt    3    2918411.197 ±    44589.191  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD        10000  
thrpt    3  472648299.254 ± 21619910.950  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar           10000  
thrpt    3     300527.267 ±      956.579  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD       100000  
thrpt    3  473998295.136 ± 11659405.795  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar          100000  
thrpt    3      30140.257 ±       25.836  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD      1000000  
thrpt    3  474356048.064 ± 28979532.447  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar         1000000  
thrpt    3       2972.073 ±       66.983  ops/s
   
   ArrayCompressionSuite.determineLongArrayCompressionTypeSIMD     10000000  
thrpt    3  471544347.771 ± 99677479.793  ops/s
   ArrayCompressionSuite.determineLongCompressionTypeScalar        10000000  
thrpt    3        298.815 ±        0.287  ops/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to