pitrou commented on PR #46789:
URL: https://github.com/apache/arrow/pull/46789#issuecomment-2995463508
Local benchmark results on my AMD Ryzen 9 3900X CPU:
```
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (40)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline
contender change %
counters
BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024 4.020 GiB/sec 59.711
GiB/sec 1385.469 {'family_index': 2, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1463689}
BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536 4.057 GiB/sec 53.022
GiB/sec 1206.800 {'family_index': 2, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 23272}
BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536 4.690 GiB/sec 7.451
GiB/sec 58.859 {'family_index': 7, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 27316}
BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024 4.777 GiB/sec 7.398
GiB/sec 54.878 {'family_index': 7, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1749779}
BM_ByteStreamSplitEncode_Double_Generic/1024 7.294 GiB/sec 8.597
GiB/sec 17.874 {'family_index': 6, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 654439}
BM_ByteStreamSplitDecode_Float_Sse2/1024 7.816 GiB/sec 8.696
GiB/sec 11.247 {'family_index': 14, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Sse2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1478390}
BM_ByteStreamSplitEncode_Double_Sse2/1024 7.656 GiB/sec 8.466
GiB/sec 10.575 {'family_index': 17, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Sse2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 685857}
BM_ByteStreamSplitDecode_Float_Sse2/65536 7.368 GiB/sec 8.091
GiB/sec 9.813 {'family_index': 14, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Sse2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 21078}
BM_ByteStreamSplitEncode_Double_Sse2/65536 7.303 GiB/sec 7.784
GiB/sec 6.596 {'family_index': 17, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Double_Sse2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 10413}
BM_ByteStreamSplitEncode_Double_Generic/65536 7.498 GiB/sec 7.918
GiB/sec 5.600 {'family_index': 6, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 10468}
BM_ByteStreamSplitDecode_Double_Sse2/65536 8.489 GiB/sec 8.720
GiB/sec 2.720 {'family_index': 15, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Sse2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 12255}
BM_ByteStreamSplitDecode_Double_Sse2/1024 9.153 GiB/sec 9.301
GiB/sec 1.619 {'family_index': 15, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Sse2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 850920}
BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024 5.002 GiB/sec 5.059
GiB/sec 1.142 {'family_index': 9, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 228207}
BM_ByteStreamSplitDecode_Double_Avx2/65536 13.154 GiB/sec 13.270
GiB/sec 0.886 {'family_index': 19, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Avx2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 18728}
BM_ByteStreamSplitEncode_Float_Avx2/1024 12.957 GiB/sec 13.062
GiB/sec 0.809 {'family_index': 20, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Avx2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2420490}
BM_ByteStreamSplitDecode_Float_Avx2/1024 19.404 GiB/sec 19.527
GiB/sec 0.634 {'family_index': 18, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Avx2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 3547412}
BM_ByteStreamSplitDecode_Double_Avx2/1024 13.644 GiB/sec 13.690
GiB/sec 0.338 {'family_index': 19, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Avx2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1263795}
BM_ByteStreamSplitEncode_Float_Generic/65536 13.313 GiB/sec 13.346
GiB/sec 0.244 {'family_index': 5, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 38128}
BM_ByteStreamSplitDecode_Float_Scalar/65536 4.038 GiB/sec 4.046
GiB/sec 0.199 {'family_index': 10, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 11709}
BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536 4.740 GiB/sec 4.748
GiB/sec 0.164 {'family_index': 9, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536', 'repetitions':
1, 'repetition_index': 0, 'threads': 1, 'iterations': 3418}
BM_ByteStreamSplitEncode_Double_Scalar/65536 5.002 GiB/sec 4.999
GiB/sec -0.058 {'family_index': 13, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 7119}
BM_ByteStreamSplitDecode_Double_Scalar/65536 3.959 GiB/sec 3.956
GiB/sec -0.078 {'family_index': 11, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 5572}
BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024 4.996 GiB/sec 4.993
GiB/sec -0.078 {'family_index': 8, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 522897}
BM_ByteStreamSplitEncode_Float_Avx2/65536 13.052 GiB/sec 13.017
GiB/sec -0.269 {'family_index': 20, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Float_Avx2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 37065}
BM_ByteStreamSplitDecode_Float_Avx2/65536 19.584 GiB/sec 19.527
GiB/sec -0.293 {'family_index': 18, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Avx2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 55415}
BM_ByteStreamSplitDecode_Double_Scalar/1024 4.056 GiB/sec 4.038
GiB/sec -0.436 {'family_index': 11, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 373026}
BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536 4.972 GiB/sec 4.940
GiB/sec -0.649 {'family_index': 8, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 8158}
BM_ByteStreamSplitDecode_Float_Generic/1024 19.630 GiB/sec 19.501
GiB/sec -0.657 {'family_index': 0, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 3536526}
BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024 4.030 GiB/sec 4.000
GiB/sec -0.730 {'family_index': 3, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 422470}
BM_ByteStreamSplitDecode_Double_Generic/65536 13.394 GiB/sec 13.293
GiB/sec -0.753 {'family_index': 1, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 19177}
BM_ByteStreamSplitDecode_Float_Scalar/1024 4.044 GiB/sec 4.013
GiB/sec -0.767 {'family_index': 10, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 728013}
BM_ByteStreamSplitEncode_Double_Scalar/1024 5.140 GiB/sec 5.098
GiB/sec -0.813 {'family_index': 13, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 465638}
BM_ByteStreamSplitDecode_Double_Generic/1024 13.988 GiB/sec 13.866
GiB/sec -0.868 {'family_index': 1, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 1285102}
BM_ByteStreamSplitEncode_Float_Scalar/1024 5.112 GiB/sec 5.062
GiB/sec -0.975 {'family_index': 12, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 938352}
BM_ByteStreamSplitDecode_Float_Generic/65536 20.086 GiB/sec 19.843
GiB/sec -1.209 {'family_index': 0, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 57481}
BM_ByteStreamSplitEncode_Float_Scalar/65536 5.080 GiB/sec 5.014
GiB/sec -1.289 {'family_index': 12, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 14556}
BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536 3.992 GiB/sec 3.939
GiB/sec -1.339 {'family_index': 3, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 6540}
BM_ByteStreamSplitEncode_Float_Generic/1024 13.396 GiB/sec 13.206
GiB/sec -1.422 {'family_index': 5, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2450449}
BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024 4.034 GiB/sec 3.900
GiB/sec -3.329 {'family_index': 4, 'per_family_instance_index': 0,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 185049}
BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536 3.871 GiB/sec 3.689
GiB/sec -4.711 {'family_index': 4, 'per_family_instance_index': 1,
'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536', 'repetitions':
1, 'repetition_index': 0, 'threads': 1, 'iterations': 2777}
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender
change %
counters
BM_ByteStreamSplitEncode_Float_Sse2/1024 11.229 GiB/sec 8.120 GiB/sec
-27.690 {'family_index': 16, 'per_family_instance_index': 0, 'run_name':
'BM_ByteStreamSplitEncode_Float_Sse2/1024', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 2049401}
BM_ByteStreamSplitEncode_Float_Sse2/65536 11.278 GiB/sec 7.951 GiB/sec
-29.498 {'family_index': 16, 'per_family_instance_index': 1, 'run_name':
'BM_ByteStreamSplitEncode_Float_Sse2/65536', 'repetitions': 1,
'repetition_index': 0, 'threads': 1, 'iterations': 25535}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]