jorgecarleitao opened a new pull request #9329:
URL: https://github.com/apache/arrow/pull/9329
This PR is another of those experiments to gather feedback and share results.
# Rational
Currently, all our arrays use an `Arc<ArrayData>`, which they expose via
`Array::data` and `Array::data_ref`. This adds a level of indirection. Now, it
happens that, afaik, in the current code base `Arc<>` is not needed.
On #9271, where we are observing some perf issues with small arrays, and one
of the ideas that came up was to get rid of `Arc` and see what happens.
# This PR
Well, this PR replaces all `Arc<ArrayData>` by `ArrayData`. On the one hand,
this means that cloning an array is a tad more expensive (`Arc` vs
`ArrayData`), even though we seldom clone an `Arc<ArrayData>`. On the other
hand, it means that often the compiler can optimize out, as many operations
will never leave the stack.
The gist of the benchmarks below is:
* ~10%-20% improvement over basically everything
* ~20%-100% improvement in `take`
There is some noise, as there are benches that are not expected to be
affected and are being affected, which I am trying to reduce by running the
benches at night.
Also, the arrow tests pass, but I did not port this to the other crates nor
the SIMD branch of the code.
Personally, I like this PR because it makes working with `ArrayData` and
arrays so much simpler: no need for `Arc::new` or `as_ref` and company (besides
the speed).
# questions
* does anyone knows why we are using `Arc<ArrayData>` in all arrays?
* Do you envision an issue with removing the `Arc`?
* Would someone be so kind and run the benches independently, just to be
sure.
# Benchmarks
```bash
# modify cargo.toml by adding `bench = false` to the section [lib]
git checkout master
cargo bench --benches -- --save-baseline `git branch --show-current`-`git
rev-parse --short HEAD`
git checkout arcless
cargo bench --benches -- --save-baseline `git branch --show-current`-`git
rev-parse --short HEAD`
```
```bash
critcmp arcless-3dbcaca49 master-437c8c944 -t 10
```
```
group arcless-3dbcaca49
master-437c8c944
----- -----------------
----------------
add_nulls_512 1.00 509.1±98.69ns ?
B/sec 1.14 581.9±98.83ns ? B/sec
array_from_vec 128 1.00 1180.2±304.50ns ?
B/sec 1.20 1411.2±475.00ns ? B/sec
array_slice 128 1.00 383.5±430.02ns ?
B/sec 1.23 471.3±94.42ns ? B/sec
array_slice 2048 1.00 393.3±94.60ns ?
B/sec 1.22 478.2±293.70ns ? B/sec
array_slice 512 1.00 393.0±77.38ns ?
B/sec 1.26 496.1±145.06ns ? B/sec
array_string_from_vec 128 1.00 3.5±0.39µs ?
B/sec 1.22 4.3±1.68µs ? B/sec
array_string_from_vec 256 1.00 4.4±0.46µs ?
B/sec 1.11 4.9±0.94µs ? B/sec
buffer_bit_ops or 1.11 706.1±102.45ns ?
B/sec 1.00 633.3±56.65ns ? B/sec
cast date32 to date64 512 1.00 7.2±0.85µs ?
B/sec 1.11 8.0±1.25µs ? B/sec
cast int32 to float64 512 1.00 3.0±0.35µs ?
B/sec 1.16 3.4±0.77µs ? B/sec
cast int32 to int64 512 1.00 3.3±0.41µs ?
B/sec 1.11 3.6±0.67µs ? B/sec
cast time32s to time32ms 512 1.13 1864.9±585.48ns ?
B/sec 1.00 1648.3±145.69ns ? B/sec
cast timestamp_ms to i64 512 1.00 350.6±67.17ns ?
B/sec 1.35 474.9±72.34ns ? B/sec
cast timestamp_ms to timestamp_ns 512 1.00 2.3±0.28µs ?
B/sec 1.22 2.8±4.01µs ? B/sec
cast utf8 to date64 512 1.22 96.5±22.75µs ?
B/sec 1.00 79.1±6.32µs ? B/sec
concat str 1024 1.00 9.9±0.41µs ?
B/sec 1.34 13.2±5.26µs ? B/sec
eq scalar Float32 1.25 82.9±33.34µs ?
B/sec 1.00 66.4±6.96µs ? B/sec
equal_512 1.00 28.1±18.85ns ?
B/sec 1.76 49.5±12.56ns ? B/sec
equal_bool_512 1.00 21.0±4.26ns ?
B/sec 1.97 41.5±2.29ns ? B/sec
equal_bool_513 1.00 28.3±25.49ns ?
B/sec 1.60 45.4±3.84ns ? B/sec
equal_nulls_512 1.00 2.4±0.19µs ?
B/sec 1.10 2.7±0.37µs ? B/sec
equal_string_512 1.00 98.3±21.85ns ?
B/sec 1.14 111.9±5.30ns ? B/sec
equal_string_nulls_512 1.00 3.7±0.46µs ?
B/sec 1.10 4.0±1.29µs ? B/sec
filter context f32 1.00 503.9±14.15µs ?
B/sec 1.11 561.8±68.59µs ? B/sec
filter context f32 high selectivity 1.00 309.3±8.26µs ?
B/sec 1.11 343.2±57.50µs ? B/sec
filter context string high selectivity 1.00 1111.6±36.80µs ?
B/sec 1.11 1238.6±174.82µs ? B/sec
filter context u8 1.00 238.9±12.44µs ?
B/sec 1.18 281.0±112.36µs ? B/sec
filter context u8 low selectivity 1.17 2.4±1.25µs ?
B/sec 1.00 2.1±0.26µs ? B/sec
filter context u8 w NULLs 1.00 526.7±98.56µs ?
B/sec 1.11 586.5±155.02µs ? B/sec
filter context u8 w NULLs high selectivity 1.00 298.7±6.11µs ?
B/sec 1.15 344.2±63.42µs ? B/sec
filter context u8 w NULLs low selectivity 1.00 2.7±0.21µs ?
B/sec 1.31 3.5±5.02µs ? B/sec
filter f32 1.00 802.9±44.38µs ?
B/sec 1.21 971.4±135.07µs ? B/sec
filter u8 low selectivity 1.00 7.8±0.23µs ?
B/sec 1.14 8.9±1.47µs ? B/sec
from_slice 1.13 1665.5±215.62µs ?
B/sec 1.00 1475.4±324.38µs ? B/sec
from_slice prepared 1.31 1092.8±125.08µs ?
B/sec 1.00 834.2±59.41µs ? B/sec
gt Float32 1.00 71.8±30.69µs ?
B/sec 1.18 84.6±9.31µs ? B/sec
gt scalar Float32 1.82 74.1±25.74µs ?
B/sec 1.00 40.8±4.06µs ? B/sec
gt_eq scalar Float32 1.14 71.8±24.75µs ?
B/sec 1.00 63.0±7.38µs ? B/sec
json_list_primitive_to_record_batch 1.00 64.6±4.38µs ?
B/sec 1.14 73.4±11.01µs ? B/sec
length 1.00 2.9±0.07µs ?
B/sec 1.29 3.7±0.77µs ? B/sec
like_utf8 scalar ends with 1.00 243.8±15.05µs ?
B/sec 1.21 294.5±45.77µs ? B/sec
like_utf8 scalar equals 1.21 101.6±13.08µs ?
B/sec 1.00 83.7±10.13µs ? B/sec
limit 512, 512 1.00 327.3±60.96ns ?
B/sec 1.38 451.1±130.11ns ? B/sec
lt Float32 1.00 69.4±7.77µs ?
B/sec 1.27 88.3±12.27µs ? B/sec
lt scalar Float32 1.00 66.5±4.71µs ?
B/sec 1.31 87.3±25.97µs ? B/sec
lt_eq Float32 1.00 118.9±19.95µs ?
B/sec 1.26 149.9±56.97µs ? B/sec
max nulls 512 1.00 1567.2±204.81ns ?
B/sec 1.16 1820.6±260.46ns ? B/sec
min nulls 512 1.00 1706.9±802.12ns ?
B/sec 1.16 1980.7±259.55ns ? B/sec
min nulls string 512 1.00 7.5±0.44µs ?
B/sec 1.18 8.9±1.66µs ? B/sec
min string 512 1.00 5.6±0.30µs ?
B/sec 1.10 6.2±1.42µs ? B/sec
multiply 512 1.00 541.6±224.47ns ?
B/sec 1.11 601.0±102.46ns ? B/sec
mutable 1.11 630.0±349.07µs ?
B/sec 1.00 566.7±77.31µs ? B/sec
mutable str 1024 1.00 1515.5±59.36µs ?
B/sec 1.17 1776.6±329.88µs ? B/sec
neq Float32 1.00 77.8±16.48µs ?
B/sec 1.18 91.5±20.30µs ? B/sec
neq scalar Float32 1.52 101.2±52.55µs ?
B/sec 1.00 66.4±6.21µs ? B/sec
sort 2^10 1.00 147.2±4.97µs ?
B/sec 1.19 175.6±25.84µs ? B/sec
sort 2^12 1.00 726.9±50.41µs ?
B/sec 1.17 849.8±109.51µs ? B/sec
sort nulls 2^12 1.00 631.5±29.67µs ?
B/sec 1.18 746.5±87.64µs ? B/sec
struct_array_from_vec 1024 1.20 17.3±4.32µs ?
B/sec 1.00 14.4±0.93µs ? B/sec
subtract 512 1.00 487.5±173.36ns ?
B/sec 1.18 574.3±37.34ns ? B/sec
sum nulls 512 1.11 363.1±73.55ns ?
B/sec 1.00 327.8±97.10ns ? B/sec
take bool 512 1.00 2.2±0.28µs ?
B/sec 1.22 2.6±0.32µs ? B/sec
take bool nulls 1024 1.00 4.0±0.25µs ?
B/sec 2.04 8.1±2.08µs ? B/sec
take bool nulls 512 1.00 2.2±0.13µs ?
B/sec 1.80 3.9±0.77µs ? B/sec
take i32 1024 1.00 1507.3±81.31ns ?
B/sec 1.57 2.4±0.58µs ? B/sec
take i32 512 1.00 1030.9±39.48ns ?
B/sec 1.32 1364.4±97.02ns ? B/sec
take i32 nulls 1024 1.00 1538.9±51.37ns ?
B/sec 1.81 2.8±1.00µs ? B/sec
take i32 nulls 512 1.00 1023.8±51.95ns ?
B/sec 2.48 2.5±1.64µs ? B/sec
take str 512 1.00 4.1±0.51µs ?
B/sec 1.14 4.6±0.96µs ? B/sec
take str null indices 1024 1.00 6.1±0.40µs ?
B/sec 1.18 7.2±1.63µs ? B/sec
take str null indices 512 1.00 3.8±0.15µs ?
B/sec 1.21 4.6±1.32µs ? B/sec
take str null values 1024 1.00 6.0±0.40µs ?
B/sec 1.18 7.1±1.02µs ? B/sec
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]