AntoinePrv commented on PR #46963:
URL: https://github.com/apache/arrow/pull/46963#issuecomment-3027767478
Benchmark result on my Macbook Pro M3:
`archery benchmark diff --suite-filter=parquet-encoding
--benchmark-filter='ByteStreamSplit'` --cmake-extras
-DARROW_RUNTIME_SIMD_LEVEL=MAX`
<details>
<summary>Show MacOS benchmark results</summary>
```
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (40)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline
contender change %
counters
BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024 14.616 GiB/sec 16.862
GiB/sec 15.369 {'family_index': 7, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 5353278}
BM_ByteStreamSplitEncode_Int16_Neon/1024 15.094 GiB/sec 16.849
GiB/sec 11.624 {'family_index': 17, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Int16_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 5639613}
BM_ByteStreamSplitDecode_Int16_Neon/1024 76.466 GiB/sec 81.758
GiB/sec 6.921 {'family_index': 14, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Int16_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 28472993}
BM_ByteStreamSplitEncode_Double_Generic/65536 11.811 GiB/sec 12.443
GiB/sec 5.355 {'family_index': 6, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 17260}
BM_ByteStreamSplitDecode_Float_Neon/1024 11.478 GiB/sec 11.877
GiB/sec 3.472 {'family_index': 15, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2111983}
BM_ByteStreamSplitDecode_Int16_Neon/65536 96.162 GiB/sec 97.298
GiB/sec 1.181 {'family_index': 14, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitDecode_Int16_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 549407}
BM_ByteStreamSplitEncode_Float_Neon/65536 12.181 GiB/sec 12.287
GiB/sec 0.866 {'family_index': 18, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitEncode_Float_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 35309}
BM_ByteStreamSplitEncode_Double_Scalar/65536 5.652 GiB/sec 5.678
GiB/sec 0.454 {'family_index': 13, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 8133}
BM_ByteStreamSplitDecode_Float_Neon/65536 11.545 GiB/sec 11.596
GiB/sec 0.441 {'family_index': 15, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitDecode_Float_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 33051}
BM_ByteStreamSplitEncode_Int16_Neon/65536 15.074 GiB/sec 15.122
GiB/sec 0.321 {'family_index': 17, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitEncode_Int16_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 86239}
BM_ByteStreamSplitDecode_Float_Generic/65536 11.509 GiB/sec 11.545
GiB/sec 0.311 {'family_index': 0, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 33235}
BM_ByteStreamSplitDecode_Double_Scalar/1024 6.866 GiB/sec 6.877
GiB/sec 0.152 {'family_index': 11, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 631980}
BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536 6.544 GiB/sec 6.552
GiB/sec 0.121 {'family_index': 3, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 10737}
BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536 5.397 GiB/sec 5.401
GiB/sec 0.079 {'family_index': 4, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536', 'repetitions':
1, 'repetition_index': 0, 'threads': 1, 'iterations': 3878}
BM_ByteStreamSplitDecode_Double_Neon/65536 9.073 GiB/sec 9.075
GiB/sec 0.020 {'family_index': 16, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitDecode_Double_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 12939}
BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024 5.852 GiB/sec 5.853
GiB/sec 0.006 {'family_index': 4, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 268754}
BM_ByteStreamSplitEncode_Double_Neon/65536 12.614 GiB/sec 12.613
GiB/sec -0.010 {'family_index': 19, 'per_family_instance_index': 3,
'run_name': 'BM_ByteStreamSplitEncode_Double_Neon/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 17997}
BM_ByteStreamSplitEncode_Float_Neon/1024 12.705 GiB/sec 12.688
GiB/sec -0.135 {'family_index': 18, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2339385}
BM_ByteStreamSplitDecode_Double_Generic/65536 9.071 GiB/sec 9.056
GiB/sec -0.165 {'family_index': 1, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 13016}
BM_ByteStreamSplitDecode_Float_Scalar/1024 6.869 GiB/sec 6.858
GiB/sec -0.167 {'family_index': 10, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1257523}
BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536 5.752 GiB/sec 5.739
GiB/sec -0.237 {'family_index': 8, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 9461}
BM_ByteStreamSplitEncode_Double_Scalar/1024 5.749 GiB/sec 5.735
GiB/sec -0.242 {'family_index': 13, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 529373}
BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024 5.836 GiB/sec 5.821
GiB/sec -0.261 {'family_index': 9, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 268614}
BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024 5.854 GiB/sec 5.839
GiB/sec -0.266 {'family_index': 8, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 612654}
BM_ByteStreamSplitEncode_Float_Generic/65536 12.246 GiB/sec 12.211
GiB/sec -0.289 {'family_index': 5, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 35389}
BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024 6.809 GiB/sec 6.786
GiB/sec -0.341 {'family_index': 3, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 714206}
BM_ByteStreamSplitDecode_Float_Scalar/65536 6.893 GiB/sec 6.866
GiB/sec -0.398 {'family_index': 10, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 19681}
BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536 3.240 GiB/sec 3.226
GiB/sec -0.433 {'family_index': 9, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536', 'repetitions':
1, 'repetition_index': 0, 'threads': 1, 'iterations': 2300}
BM_ByteStreamSplitEncode_Float_Scalar/65536 5.674 GiB/sec 5.643
GiB/sec -0.545 {'family_index': 12, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 16275}
BM_ByteStreamSplitEncode_Float_Generic/1024 12.632 GiB/sec 12.553
GiB/sec -0.622 {'family_index': 5, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2324022}
BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536 15.117 GiB/sec 15.011
GiB/sec -0.702 {'family_index': 7, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 85173}
BM_ByteStreamSplitEncode_Float_Scalar/1024 5.760 GiB/sec 5.715
GiB/sec -0.784 {'family_index': 12, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1058889}
BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536 97.112 GiB/sec 96.282
GiB/sec -0.854 {'family_index': 2, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 557480}
BM_ByteStreamSplitDecode_Double_Neon/1024 9.904 GiB/sec 9.786
GiB/sec -1.192 {'family_index': 16, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 909079}
BM_ByteStreamSplitDecode_Double_Scalar/65536 6.345 GiB/sec 6.269
GiB/sec -1.195 {'family_index': 11, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 9048}
BM_ByteStreamSplitDecode_Float_Generic/1024 11.745 GiB/sec 11.591
GiB/sec -1.313 {'family_index': 0, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2150961}
BM_ByteStreamSplitDecode_Double_Generic/1024 9.898 GiB/sec 9.723
GiB/sec -1.767 {'family_index': 1, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 910391}
BM_ByteStreamSplitEncode_Double_Neon/1024 17.664 GiB/sec 17.341
GiB/sec -1.826 {'family_index': 19, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Neon/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1618389}
BM_ByteStreamSplitEncode_Double_Generic/1024 17.171 GiB/sec 16.815
GiB/sec -2.078 {'family_index': 6, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1566816}
BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024 81.316 GiB/sec 78.479
GiB/sec -3.489 {'family_index': 2, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 29913379}
```
</details>
Benchmark result on my Linux cloud instance:
`archery benchmark diff --suite-filter=parquet-encoding
--benchmark-filter='ByteStreamSplit'` --cmake-extras
-DARROW_RUNTIME_SIMD_LEVEL=MAX --cmake-extras -DARROW_SIMD_LEVEL=AVX2`
<details>
<summary>Show Linux Avx2 benchmark results</summary>
```
Upcoming...
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]