pitrou commented on PR #46789: URL: https://github.com/apache/arrow/pull/46789#issuecomment-2995463508
Local benchmark results on my AMD Ryzen 9 3900X CPU: ``` -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Non-regressions: (40) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- benchmark baseline contender change % counters BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024 4.020 GiB/sec 59.711 GiB/sec 1385.469 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1463689} BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536 4.057 GiB/sec 53.022 GiB/sec 1206.800 {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<2>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 23272} BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536 4.690 GiB/sec 7.451 GiB/sec 58.859 {'family_index': 7, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 27316} BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024 4.777 GiB/sec 7.398 GiB/sec 54.878 {'family_index': 7, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<2>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1749779} BM_ByteStreamSplitEncode_Double_Generic/1024 7.294 GiB/sec 8.597 GiB/sec 17.874 {'family_index': 6, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 654439} BM_ByteStreamSplitDecode_Float_Sse2/1024 7.816 GiB/sec 8.696 GiB/sec 11.247 {'family_index': 14, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Float_Sse2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1478390} BM_ByteStreamSplitEncode_Double_Sse2/1024 7.656 GiB/sec 8.466 GiB/sec 10.575 {'family_index': 17, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Double_Sse2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 685857} BM_ByteStreamSplitDecode_Float_Sse2/65536 7.368 GiB/sec 8.091 GiB/sec 9.813 {'family_index': 14, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Float_Sse2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 21078} BM_ByteStreamSplitEncode_Double_Sse2/65536 7.303 GiB/sec 7.784 GiB/sec 6.596 {'family_index': 17, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Double_Sse2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10413} BM_ByteStreamSplitEncode_Double_Generic/65536 7.498 GiB/sec 7.918 GiB/sec 5.600 {'family_index': 6, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Double_Generic/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10468} BM_ByteStreamSplitDecode_Double_Sse2/65536 8.489 GiB/sec 8.720 GiB/sec 2.720 {'family_index': 15, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Double_Sse2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12255} BM_ByteStreamSplitDecode_Double_Sse2/1024 9.153 GiB/sec 9.301 GiB/sec 1.619 {'family_index': 15, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Double_Sse2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 850920} BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024 5.002 GiB/sec 5.059 GiB/sec 1.142 {'family_index': 9, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 228207} BM_ByteStreamSplitDecode_Double_Avx2/65536 13.154 GiB/sec 13.270 GiB/sec 0.886 {'family_index': 19, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Double_Avx2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 18728} BM_ByteStreamSplitEncode_Float_Avx2/1024 12.957 GiB/sec 13.062 GiB/sec 0.809 {'family_index': 20, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Float_Avx2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2420490} BM_ByteStreamSplitDecode_Float_Avx2/1024 19.404 GiB/sec 19.527 GiB/sec 0.634 {'family_index': 18, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Float_Avx2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3547412} BM_ByteStreamSplitDecode_Double_Avx2/1024 13.644 GiB/sec 13.690 GiB/sec 0.338 {'family_index': 19, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Double_Avx2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1263795} BM_ByteStreamSplitEncode_Float_Generic/65536 13.313 GiB/sec 13.346 GiB/sec 0.244 {'family_index': 5, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 38128} BM_ByteStreamSplitDecode_Float_Scalar/65536 4.038 GiB/sec 4.046 GiB/sec 0.199 {'family_index': 10, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11709} BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536 4.740 GiB/sec 4.748 GiB/sec 0.164 {'family_index': 9, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<16>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3418} BM_ByteStreamSplitEncode_Double_Scalar/65536 5.002 GiB/sec 4.999 GiB/sec -0.058 {'family_index': 13, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7119} BM_ByteStreamSplitDecode_Double_Scalar/65536 3.959 GiB/sec 3.956 GiB/sec -0.078 {'family_index': 11, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 5572} BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024 4.996 GiB/sec 4.993 GiB/sec -0.078 {'family_index': 8, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 522897} BM_ByteStreamSplitEncode_Float_Avx2/65536 13.052 GiB/sec 13.017 GiB/sec -0.269 {'family_index': 20, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Float_Avx2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 37065} BM_ByteStreamSplitDecode_Float_Avx2/65536 19.584 GiB/sec 19.527 GiB/sec -0.293 {'family_index': 18, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Float_Avx2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 55415} BM_ByteStreamSplitDecode_Double_Scalar/1024 4.056 GiB/sec 4.038 GiB/sec -0.436 {'family_index': 11, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Double_Scalar/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 373026} BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536 4.972 GiB/sec 4.940 GiB/sec -0.649 {'family_index': 8, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_FLBA_Generic<7>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 8158} BM_ByteStreamSplitDecode_Float_Generic/1024 19.630 GiB/sec 19.501 GiB/sec -0.657 {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 3536526} BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024 4.030 GiB/sec 4.000 GiB/sec -0.730 {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 422470} BM_ByteStreamSplitDecode_Double_Generic/65536 13.394 GiB/sec 13.293 GiB/sec -0.753 {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 19177} BM_ByteStreamSplitDecode_Float_Scalar/1024 4.044 GiB/sec 4.013 GiB/sec -0.767 {'family_index': 10, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Float_Scalar/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 728013} BM_ByteStreamSplitEncode_Double_Scalar/1024 5.140 GiB/sec 5.098 GiB/sec -0.813 {'family_index': 13, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Double_Scalar/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 465638} BM_ByteStreamSplitDecode_Double_Generic/1024 13.988 GiB/sec 13.866 GiB/sec -0.868 {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_Double_Generic/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1285102} BM_ByteStreamSplitEncode_Float_Scalar/1024 5.112 GiB/sec 5.062 GiB/sec -0.975 {'family_index': 12, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 938352} BM_ByteStreamSplitDecode_Float_Generic/65536 20.086 GiB/sec 19.843 GiB/sec -1.209 {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_Float_Generic/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 57481} BM_ByteStreamSplitEncode_Float_Scalar/65536 5.080 GiB/sec 5.014 GiB/sec -1.289 {'family_index': 12, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Float_Scalar/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14556} BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536 3.992 GiB/sec 3.939 GiB/sec -1.339 {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<7>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 6540} BM_ByteStreamSplitEncode_Float_Generic/1024 13.396 GiB/sec 13.206 GiB/sec -1.422 {'family_index': 5, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Float_Generic/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2450449} BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024 4.034 GiB/sec 3.900 GiB/sec -3.329 {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 185049} BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536 3.871 GiB/sec 3.689 GiB/sec -4.711 {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitDecode_FLBA_Generic<16>/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2777} --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Regressions: (2) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- benchmark baseline contender change % counters BM_ByteStreamSplitEncode_Float_Sse2/1024 11.229 GiB/sec 8.120 GiB/sec -27.690 {'family_index': 16, 'per_family_instance_index': 0, 'run_name': 'BM_ByteStreamSplitEncode_Float_Sse2/1024', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2049401} BM_ByteStreamSplitEncode_Float_Sse2/65536 11.278 GiB/sec 7.951 GiB/sec -29.498 {'family_index': 16, 'per_family_instance_index': 1, 'run_name': 'BM_ByteStreamSplitEncode_Float_Sse2/65536', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 25535} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org