AntoinePrv commented on PR #46963:
URL: https://github.com/apache/arrow/pull/46963#issuecomment-3027767478

   Benchmark result on my Macbook Pro M3:
   `archery benchmark diff --suite-filter=parquet-encoding 
--benchmark-filter='ByteStreamSplit'` --cmake-extras 
-DARROW_RUNTIME_SIMD_LEVEL=MAX`
   
   <details>
    <summary>Show MacOS benchmark results</summary>
    
    ```
    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (40)
   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                         benchmark       baseline      
contender  change %                                                             
                                                                                
                                           counters
     BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024 14.616 GiB/sec 16.862 
GiB/sec    15.369  {'family_index': 7, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 5353278}
          BM_ByteStreamSplitEncode_Int16_Neon/1024 15.094 GiB/sec 16.849 
GiB/sec    11.624      {'family_index': 17, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Int16_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 5639613}
          BM_ByteStreamSplitDecode_Int16_Neon/1024 76.466 GiB/sec 81.758 
GiB/sec     6.921     {'family_index': 14, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Int16_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 28472993}
     BM_ByteStreamSplitEncode_Double_Generic/65536 11.811 GiB/sec 12.443 
GiB/sec     5.355    {'family_index': 6, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 17260}
          BM_ByteStreamSplitDecode_Float_Neon/1024 11.478 GiB/sec 11.877 
GiB/sec     3.472      {'family_index': 15, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 2111983}
         BM_ByteStreamSplitDecode_Int16_Neon/65536 96.162 GiB/sec 97.298 
GiB/sec     1.181      {'family_index': 14, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitDecode_Int16_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 549407}
         BM_ByteStreamSplitEncode_Float_Neon/65536 12.181 GiB/sec 12.287 
GiB/sec     0.866       {'family_index': 18, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 35309}
      BM_ByteStreamSplitEncode_Double_Scalar/65536  5.652 GiB/sec  5.678 
GiB/sec     0.454     {'family_index': 13, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 8133}
         BM_ByteStreamSplitDecode_Float_Neon/65536 11.545 GiB/sec 11.596 
GiB/sec     0.441       {'family_index': 15, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 33051}
         BM_ByteStreamSplitEncode_Int16_Neon/65536 15.074 GiB/sec 15.122 
GiB/sec     0.321       {'family_index': 17, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitEncode_Int16_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 86239}
      BM_ByteStreamSplitDecode_Float_Generic/65536 11.509 GiB/sec 11.545 
GiB/sec     0.311     {'family_index': 0, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 33235}
       BM_ByteStreamSplitDecode_Double_Scalar/1024  6.866 GiB/sec  6.877 
GiB/sec     0.152    {'family_index': 11, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 631980}
    BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536  6.544 GiB/sec  6.552 
GiB/sec     0.121   {'family_index': 3, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 10737}
   BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536  5.397 GiB/sec  5.401 
GiB/sec     0.079   {'family_index': 4, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536', 'repetitions': 
1, 'repetition_index': 0, 'threads': 1, 'iterations': 3878}
        BM_ByteStreamSplitDecode_Double_Neon/65536  9.073 GiB/sec  9.075 
GiB/sec     0.020      {'family_index': 16, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 12939}
    BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024  5.852 GiB/sec  5.853 
GiB/sec     0.006  {'family_index': 4, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 268754}
        BM_ByteStreamSplitEncode_Double_Neon/65536 12.614 GiB/sec 12.613 
GiB/sec    -0.010      {'family_index': 19, 'per_family_instance_index': 3, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Neon/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 17997}
          BM_ByteStreamSplitEncode_Float_Neon/1024 12.705 GiB/sec 12.688 
GiB/sec    -0.135      {'family_index': 18, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 2339385}
     BM_ByteStreamSplitDecode_Double_Generic/65536  9.071 GiB/sec  9.056 
GiB/sec    -0.165    {'family_index': 1, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 13016}
        BM_ByteStreamSplitDecode_Float_Scalar/1024  6.869 GiB/sec  6.858 
GiB/sec    -0.167    {'family_index': 10, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 1257523}
    BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536  5.752 GiB/sec  5.739 
GiB/sec    -0.237    {'family_index': 8, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 9461}
       BM_ByteStreamSplitEncode_Double_Scalar/1024  5.749 GiB/sec  5.735 
GiB/sec    -0.242    {'family_index': 13, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 529373}
    BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024  5.836 GiB/sec  5.821 
GiB/sec    -0.261  {'family_index': 9, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 268614}
     BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024  5.854 GiB/sec  5.839 
GiB/sec    -0.266   {'family_index': 8, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 612654}
      BM_ByteStreamSplitEncode_Float_Generic/65536 12.246 GiB/sec 12.211 
GiB/sec    -0.289     {'family_index': 5, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 35389}
     BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024  6.809 GiB/sec  6.786 
GiB/sec    -0.341   {'family_index': 3, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 714206}
       BM_ByteStreamSplitDecode_Float_Scalar/65536  6.893 GiB/sec  6.866 
GiB/sec    -0.398     {'family_index': 10, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 19681}
   BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536  3.240 GiB/sec  3.226 
GiB/sec    -0.433   {'family_index': 9, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536', 'repetitions': 
1, 'repetition_index': 0, 'threads': 1, 'iterations': 2300}
       BM_ByteStreamSplitEncode_Float_Scalar/65536  5.674 GiB/sec  5.643 
GiB/sec    -0.545     {'family_index': 12, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 16275}
       BM_ByteStreamSplitEncode_Float_Generic/1024 12.632 GiB/sec 12.553 
GiB/sec    -0.622    {'family_index': 5, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 2324022}
    BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536 15.117 GiB/sec 15.011 
GiB/sec    -0.702   {'family_index': 7, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 85173}
        BM_ByteStreamSplitEncode_Float_Scalar/1024  5.760 GiB/sec  5.715 
GiB/sec    -0.784    {'family_index': 12, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 1058889}
    BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536 97.112 GiB/sec 96.282 
GiB/sec    -0.854  {'family_index': 2, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 557480}
         BM_ByteStreamSplitDecode_Double_Neon/1024  9.904 GiB/sec  9.786 
GiB/sec    -1.192      {'family_index': 16, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 909079}
      BM_ByteStreamSplitDecode_Double_Scalar/65536  6.345 GiB/sec  6.269 
GiB/sec    -1.195     {'family_index': 11, 'per_family_instance_index': 1, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/65536', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 9048}
       BM_ByteStreamSplitDecode_Float_Generic/1024 11.745 GiB/sec 11.591 
GiB/sec    -1.313    {'family_index': 0, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 2150961}
      BM_ByteStreamSplitDecode_Double_Generic/1024  9.898 GiB/sec  9.723 
GiB/sec    -1.767    {'family_index': 1, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 910391}
         BM_ByteStreamSplitEncode_Double_Neon/1024 17.664 GiB/sec 17.341 
GiB/sec    -1.826     {'family_index': 19, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Neon/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 1618389}
      BM_ByteStreamSplitEncode_Double_Generic/1024 17.171 GiB/sec 16.815 
GiB/sec    -2.078   {'family_index': 6, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 1566816}
     BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024 81.316 GiB/sec 78.479 
GiB/sec    -3.489 {'family_index': 2, 'per_family_instance_index': 0, 
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024', 'repetitions': 1, 
'repetition_index': 0, 'threads': 1, 'iterations': 29913379}
    ```
    
    </details>
    
    Benchmark result on my Linux cloud instance:
   `archery benchmark diff --suite-filter=parquet-encoding 
--benchmark-filter='ByteStreamSplit'` --cmake-extras 
-DARROW_RUNTIME_SIMD_LEVEL=MAX --cmake-extras -DARROW_SIMD_LEVEL=AVX2`
   
   <details>
    <summary>Show Linux Avx2 benchmark results</summary>
    
    ```
    Upcoming...
    ```
    </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to