zanmato1984 commented on PR #43832:
URL: https://github.com/apache/arrow/pull/43832#issuecomment-2322961505
This is on my other desktop (Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz,
Coffee Lake), similar symptom (possibly because it is also Coffee Lake as my
MPB).
The scalar version:
```
ARROW_USER_SIMD_LEVEL=NONE ./arrow-acero-hash-join-benchmark
--benchmark_filter="BM_RowArray"
2024-09-01T00:32:49+08:00
Running ./arrow-acero-hash-join-benchmark
Run on (8 X 4900 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 12288 KiB (x1)
Load Average: 0.46, 3.08, 2.34
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------------------------
BM_RowArray_Decode/"boolean"
345809 ns 345761 ns 1896
rows/sec=189.538M/s
BM_RowArray_Decode/"int8"
267577 ns 267553 ns 2678
rows/sec=244.942M/s
BM_RowArray_Decode/"int16"
237106 ns 237094 ns 2872
rows/sec=276.409M/s
BM_RowArray_Decode/"int32"
243701 ns 243697 ns 2874
rows/sec=268.92M/s
BM_RowArray_Decode/"int64"
239891 ns 239886 ns 2709
rows/sec=273.192M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:3
316511 ns 316471 ns 2260
rows/sec=207.081M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:5
310797 ns 310759 ns 2165
rows/sec=210.887M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:6
324059 ns 324020 ns 2251
rows/sec=202.256M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:7
311799 ns 311753 ns 2244
rows/sec=210.214M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:9
364401 ns 364346 ns 2016
rows/sec=179.87M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:16
349918 ns 349868 ns 1997
rows/sec=187.313M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:42
507058 ns 506962 ns 1427
rows/sec=129.27M/s
BM_RowArray_DecodeBinary/max_length:32
1261872 ns 1261465 ns 554
rows/sec=51.9515M/s
BM_RowArray_DecodeBinary/max_length:64
1585243 ns 1584698 ns 462
rows/sec=41.3549M/s
BM_RowArray_DecodeBinary/max_length:128
1822727 ns 1822343 ns 384
rows/sec=35.962M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:0
379210 ns 379150 ns 1843 rows/sec=172.847M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:1
275680 ns 275657 ns 2693 rows/sec=237.741M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:2
599291 ns 599291 ns 1257 rows/sec=109.354M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:0
506824 ns 506710 ns 1376 rows/sec=129.334M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:1
360611 ns 360579 ns 2123 rows/sec=181.75M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:2
1182248 ns 1181939 ns 603 rows/sec=55.447M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:3
1395220 ns 1394817 ns 529 rows/sec=46.9847M/s
```
The AVX2 version:
```
./arrow-acero-hash-join-benchmark --benchmark_filter="BM_RowArray"
2024-09-01T00:33:14+08:00
Running ./arrow-acero-hash-join-benchmark
Run on (8 X 4900 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 12288 KiB (x1)
Load Average: 0.64, 2.91, 2.31
***WARNING*** CPU scaling is enabled, the benchmark real time measurements
may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------------------------
BM_RowArray_Decode/"boolean"
262395 ns 262341 ns 2665
rows/sec=249.808M/s
BM_RowArray_Decode/"int8"
263405 ns 263397 ns 2716
rows/sec=248.807M/s
BM_RowArray_Decode/"int16"
248155 ns 248106 ns 2821
rows/sec=264.141M/s
BM_RowArray_Decode/"int32"
257523 ns 257519 ns 2825
rows/sec=254.486M/s
BM_RowArray_Decode/"int64"
245070 ns 245020 ns 2824
rows/sec=267.468M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:3
330801 ns 330759 ns 1980
rows/sec=198.135M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:5
327874 ns 327839 ns 2134 rows/sec=199.9M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:6
331278 ns 331242 ns 1947
rows/sec=197.846M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:7
328647 ns 328611 ns 2112
rows/sec=199.43M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:9
335129 ns 335101 ns 1937
rows/sec=195.568M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:16
347641 ns 347601 ns 2097
rows/sec=188.535M/s
BM_RowArray_DecodeFixedSizeBinary/fixed_size:42
408356 ns 408265 ns 1731
rows/sec=160.521M/s
BM_RowArray_DecodeBinary/max_length:32
985453 ns 985190 ns 716
rows/sec=66.5202M/s
BM_RowArray_DecodeBinary/max_length:64
1250078 ns 1249727 ns 560
rows/sec=52.4394M/s
BM_RowArray_DecodeBinary/max_length:128
1467264 ns 1466902 ns 474
rows/sec=44.6758M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:0
266468 ns 266456 ns 2365 rows/sec=245.95M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:1
246552 ns 246557 ns 2803 rows/sec=265.8M/s
BM_RowArray_DecodeOneOfColumns/"fixed_length_row:{boolean,int32,fixed_size_binary(64)}"/column:2
437251 ns 437236 ns 1504 rows/sec=149.885M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:0
455065 ns 455005 ns 1603 rows/sec=144.031M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:1
445927 ns 445798 ns 1560 rows/sec=147.006M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:2
1033287 ns 1032913 ns 702 rows/sec=63.4468M/s
BM_RowArray_DecodeOneOfColumns/"var_length_row:{boolean,int32,utf8,utf8}"/column:3
1193991 ns 1193373 ns 544 rows/sec=54.9158M/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]