AlenkaF commented on PR #40867:
URL: https://github.com/apache/arrow/pull/40867#issuecomment-2025342730
I have run the benchmarks for uniform data types on this branch. Currently
the benchmarks have not been updated to use columnar-major layout as before
(`row_major=false`) but use row-major layout (set to be the default in this PR)
and so the diff of the benchmarks is actually measuring the difference between
column-major (baseline) and row-major (contender) conversion - which is great
to see:
```
(pyarrow-dev) alenkafrim@Alenkas-MacBook-Pro arrow % archery --quiet
benchmark diff --benchmark-filter=BatchToTensorSimple
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (1)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline
contender change %
counters
BatchToTensorSimple<Int64Type>/size:65536/num_columns:300 1.257 GiB/sec
1.206 GiB/sec -4.048 {'family_index': 3, 'per_family_instance_index': 2,
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14562}
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (23)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline
contender change %
counters
BatchToTensorSimple<Int32Type>/size:65536/num_columns:300 1.251 GiB/sec
1.160 GiB/sec -7.312 {'family_index': 2, 'per_family_instance_index': 2,
'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14214}
BatchToTensorSimple<Int16Type>/size:65536/num_columns:300 1.219 GiB/sec
1.002 GiB/sec -17.740 {'family_index': 1, 'per_family_instance_index': 2,
'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14454}
BatchToTensorSimple<Int64Type>/size:65536/num_columns:30 9.126 GiB/sec
7.340 GiB/sec -19.563 {'family_index': 3, 'per_family_instance_index': 1,
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 102624}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300 11.804 GiB/sec
7.882 GiB/sec -33.223 {'family_index': 3, 'per_family_instance_index': 5,
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2156}
BatchToTensorSimple<Int64Type>/size:65536/num_columns:3 26.569 GiB/sec
17.379 GiB/sec -34.590 {'family_index': 3, 'per_family_instance_index': 0,
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 369358}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3 14.526 GiB/sec
8.555 GiB/sec -41.104 {'family_index': 3, 'per_family_instance_index': 3,
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2579}
BatchToTensorSimple<Int32Type>/size:65536/num_columns:30 9.299 GiB/sec
5.168 GiB/sec -44.419 {'family_index': 2, 'per_family_instance_index': 1,
'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 103448}
BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 1.240 GiB/sec
660.038 MiB/sec -48.011 {'family_index': 0, 'per_family_instance_index': 2,
'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13673}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3 14.236 GiB/sec
6.438 GiB/sec -54.776 {'family_index': 2, 'per_family_instance_index': 3,
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2583}
BatchToTensorSimple<Int16Type>/size:65536/num_columns:30 9.152 GiB/sec
3.796 GiB/sec -58.521 {'family_index': 1, 'per_family_instance_index': 1,
'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 105352}
BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30 13.652 GiB/sec
5.379 GiB/sec -60.597 {'family_index': 3, 'per_family_instance_index': 4,
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2499}
BatchToTensorSimple<Int32Type>/size:65536/num_columns:3 27.147 GiB/sec
8.674 GiB/sec -68.049 {'family_index': 2, 'per_family_instance_index': 0,
'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 343999}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3 14.370 GiB/sec
4.404 GiB/sec -69.348 {'family_index': 1, 'per_family_instance_index': 3,
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2673}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300 12.017 GiB/sec
3.332 GiB/sec -72.269 {'family_index': 2, 'per_family_instance_index': 5,
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2158}
BatchToTensorSimple<Int16Type>/size:65536/num_columns:3 24.767 GiB/sec
5.370 GiB/sec -78.317 {'family_index': 1, 'per_family_instance_index': 0,
'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 332207}
BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30 13.938 GiB/sec
2.928 GiB/sec -78.994 {'family_index': 2, 'per_family_instance_index': 4,
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2446}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30 12.799 GiB/sec
2.006 GiB/sec -84.327 {'family_index': 1, 'per_family_instance_index': 4,
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2448}
BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300 12.092 GiB/sec
1.859 GiB/sec -84.624 {'family_index': 1, 'per_family_instance_index': 5,
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2196}
BatchToTensorSimple<Int8Type>/size:65536/num_columns:30 9.130 GiB/sec
1.236 GiB/sec -86.461 {'family_index': 0, 'per_family_instance_index': 1,
'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 103747}
BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3 13.566 GiB/sec
1.263 GiB/sec -90.691 {'family_index': 0, 'per_family_instance_index': 3,
'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2480}
BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 13.245 GiB/sec
939.018 MiB/sec -93.077 {'family_index': 0, 'per_family_instance_index': 4,
'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2366}
BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 11.459 GiB/sec
702.520 MiB/sec -94.013 {'family_index': 0, 'per_family_instance_index': 5,
'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2029}
BatchToTensorSimple<Int8Type>/size:65536/num_columns:3 29.609 GiB/sec
1.391 GiB/sec -95.302 {'family_index': 0, 'per_family_instance_index': 0,
'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3',
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 294453}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]