jorgecarleitao opened a new pull request #9329:
URL: https://github.com/apache/arrow/pull/9329


   This PR is another of those experiments to gather feedback and share results.
   
   # Rational
   
   Currently, all our arrays use an `Arc<ArrayData>`, which they expose via 
`Array::data` and `Array::data_ref`. This adds a level of indirection. Now, it 
happens that, afaik, in the current code base `Arc<>` is not needed.
   
   On #9271, where we are observing some perf issues with small arrays, and one 
of the ideas that came up was to get rid of `Arc` and see what happens.
   
   # This PR
   
   Well, this PR replaces all `Arc<ArrayData>` by `ArrayData`. On the one hand, 
this means that cloning an array is a tad more expensive (`Arc` vs 
`ArrayData`), even though we seldom clone an `Arc<ArrayData>`. On the other 
hand, it means that often the compiler can optimize out, as many operations 
will never leave the stack.
   
   The gist of the benchmarks below is:
   * ~10%-20% improvement over basically everything
   * ~20%-100% improvement in `take`
   
   There is some noise, as there are benches that are not expected to be 
affected and are being affected, which I am trying to reduce by running the 
benches at night.
   
   Also, the arrow tests pass, but I did not port this to the other crates nor 
the SIMD branch of the code.
   
   Personally, I like this PR because it makes working with `ArrayData` and 
arrays so much simpler: no need for `Arc::new` or `as_ref` and company (besides 
the speed).
   
   # questions
   
   * does anyone knows why we are using `Arc<ArrayData>` in all arrays?
   * Do you envision an issue with removing the `Arc`?
   * Would someone be so kind and run the benches independently, just to be 
sure.
   
   # Benchmarks
   
   ```bash
   # modify cargo.toml by adding `bench = false` to the section [lib]
   
   git checkout master
   cargo bench --benches -- --save-baseline `git branch --show-current`-`git 
rev-parse --short HEAD`
   
   git checkout arcless
   cargo bench --benches -- --save-baseline `git branch --show-current`-`git 
rev-parse --short HEAD`
   ```
   
   ```bash
   critcmp arcless-3dbcaca49 master-437c8c944 -t 10
   ```
   
   ```
   group                                         arcless-3dbcaca49              
         master-437c8c944
   -----                                         -----------------              
         ----------------
   add_nulls_512                                 1.00   509.1±98.69ns        ? 
B/sec     1.14   581.9±98.83ns        ? B/sec
   array_from_vec 128                            1.00  1180.2±304.50ns        ? 
B/sec    1.20  1411.2±475.00ns        ? B/sec
   array_slice 128                               1.00  383.5±430.02ns        ? 
B/sec     1.23   471.3±94.42ns        ? B/sec
   array_slice 2048                              1.00   393.3±94.60ns        ? 
B/sec     1.22  478.2±293.70ns        ? B/sec
   array_slice 512                               1.00   393.0±77.38ns        ? 
B/sec     1.26  496.1±145.06ns        ? B/sec
   array_string_from_vec 128                     1.00      3.5±0.39µs        ? 
B/sec     1.22      4.3±1.68µs        ? B/sec
   array_string_from_vec 256                     1.00      4.4±0.46µs        ? 
B/sec     1.11      4.9±0.94µs        ? B/sec
   buffer_bit_ops or                             1.11  706.1±102.45ns        ? 
B/sec     1.00   633.3±56.65ns        ? B/sec
   cast date32 to date64 512                     1.00      7.2±0.85µs        ? 
B/sec     1.11      8.0±1.25µs        ? B/sec
   cast int32 to float64 512                     1.00      3.0±0.35µs        ? 
B/sec     1.16      3.4±0.77µs        ? B/sec
   cast int32 to int64 512                       1.00      3.3±0.41µs        ? 
B/sec     1.11      3.6±0.67µs        ? B/sec
   cast time32s to time32ms 512                  1.13  1864.9±585.48ns        ? 
B/sec    1.00  1648.3±145.69ns        ? B/sec
   cast timestamp_ms to i64 512                  1.00   350.6±67.17ns        ? 
B/sec     1.35   474.9±72.34ns        ? B/sec
   cast timestamp_ms to timestamp_ns 512         1.00      2.3±0.28µs        ? 
B/sec     1.22      2.8±4.01µs        ? B/sec
   cast utf8 to date64 512                       1.22    96.5±22.75µs        ? 
B/sec     1.00     79.1±6.32µs        ? B/sec
   concat str 1024                               1.00      9.9±0.41µs        ? 
B/sec     1.34     13.2±5.26µs        ? B/sec
   eq scalar Float32                             1.25    82.9±33.34µs        ? 
B/sec     1.00     66.4±6.96µs        ? B/sec
   equal_512                                     1.00    28.1±18.85ns        ? 
B/sec     1.76    49.5±12.56ns        ? B/sec
   equal_bool_512                                1.00     21.0±4.26ns        ? 
B/sec     1.97     41.5±2.29ns        ? B/sec
   equal_bool_513                                1.00    28.3±25.49ns        ? 
B/sec     1.60     45.4±3.84ns        ? B/sec
   equal_nulls_512                               1.00      2.4±0.19µs        ? 
B/sec     1.10      2.7±0.37µs        ? B/sec
   equal_string_512                              1.00    98.3±21.85ns        ? 
B/sec     1.14    111.9±5.30ns        ? B/sec
   equal_string_nulls_512                        1.00      3.7±0.46µs        ? 
B/sec     1.10      4.0±1.29µs        ? B/sec
   filter context f32                            1.00   503.9±14.15µs        ? 
B/sec     1.11   561.8±68.59µs        ? B/sec
   filter context f32 high selectivity           1.00    309.3±8.26µs        ? 
B/sec     1.11   343.2±57.50µs        ? B/sec
   filter context string high selectivity        1.00  1111.6±36.80µs        ? 
B/sec     1.11  1238.6±174.82µs        ? B/sec
   filter context u8                             1.00   238.9±12.44µs        ? 
B/sec     1.18  281.0±112.36µs        ? B/sec
   filter context u8 low selectivity             1.17      2.4±1.25µs        ? 
B/sec     1.00      2.1±0.26µs        ? B/sec
   filter context u8 w NULLs                     1.00   526.7±98.56µs        ? 
B/sec     1.11  586.5±155.02µs        ? B/sec
   filter context u8 w NULLs high selectivity    1.00    298.7±6.11µs        ? 
B/sec     1.15   344.2±63.42µs        ? B/sec
   filter context u8 w NULLs low selectivity     1.00      2.7±0.21µs        ? 
B/sec     1.31      3.5±5.02µs        ? B/sec
   filter f32                                    1.00   802.9±44.38µs        ? 
B/sec     1.21  971.4±135.07µs        ? B/sec
   filter u8 low selectivity                     1.00      7.8±0.23µs        ? 
B/sec     1.14      8.9±1.47µs        ? B/sec
   from_slice                                    1.13  1665.5±215.62µs        ? 
B/sec    1.00  1475.4±324.38µs        ? B/sec
   from_slice prepared                           1.31  1092.8±125.08µs        ? 
B/sec    1.00   834.2±59.41µs        ? B/sec
   gt Float32                                    1.00    71.8±30.69µs        ? 
B/sec     1.18     84.6±9.31µs        ? B/sec
   gt scalar Float32                             1.82    74.1±25.74µs        ? 
B/sec     1.00     40.8±4.06µs        ? B/sec
   gt_eq scalar Float32                          1.14    71.8±24.75µs        ? 
B/sec     1.00     63.0±7.38µs        ? B/sec
   json_list_primitive_to_record_batch           1.00     64.6±4.38µs        ? 
B/sec     1.14    73.4±11.01µs        ? B/sec
   length                                        1.00      2.9±0.07µs        ? 
B/sec     1.29      3.7±0.77µs        ? B/sec
   like_utf8 scalar ends with                    1.00   243.8±15.05µs        ? 
B/sec     1.21   294.5±45.77µs        ? B/sec
   like_utf8 scalar equals                       1.21   101.6±13.08µs        ? 
B/sec     1.00    83.7±10.13µs        ? B/sec
   limit 512, 512                                1.00   327.3±60.96ns        ? 
B/sec     1.38  451.1±130.11ns        ? B/sec
   lt Float32                                    1.00     69.4±7.77µs        ? 
B/sec     1.27    88.3±12.27µs        ? B/sec
   lt scalar Float32                             1.00     66.5±4.71µs        ? 
B/sec     1.31    87.3±25.97µs        ? B/sec
   lt_eq Float32                                 1.00   118.9±19.95µs        ? 
B/sec     1.26   149.9±56.97µs        ? B/sec
   max nulls 512                                 1.00  1567.2±204.81ns        ? 
B/sec    1.16  1820.6±260.46ns        ? B/sec
   min nulls 512                                 1.00  1706.9±802.12ns        ? 
B/sec    1.16  1980.7±259.55ns        ? B/sec
   min nulls string 512                          1.00      7.5±0.44µs        ? 
B/sec     1.18      8.9±1.66µs        ? B/sec
   min string 512                                1.00      5.6±0.30µs        ? 
B/sec     1.10      6.2±1.42µs        ? B/sec
   multiply 512                                  1.00  541.6±224.47ns        ? 
B/sec     1.11  601.0±102.46ns        ? B/sec
   mutable                                       1.11  630.0±349.07µs        ? 
B/sec     1.00   566.7±77.31µs        ? B/sec
   mutable str 1024                              1.00  1515.5±59.36µs        ? 
B/sec     1.17  1776.6±329.88µs        ? B/sec
   neq Float32                                   1.00    77.8±16.48µs        ? 
B/sec     1.18    91.5±20.30µs        ? B/sec
   neq scalar Float32                            1.52   101.2±52.55µs        ? 
B/sec     1.00     66.4±6.21µs        ? B/sec
   sort 2^10                                     1.00    147.2±4.97µs        ? 
B/sec     1.19   175.6±25.84µs        ? B/sec
   sort 2^12                                     1.00   726.9±50.41µs        ? 
B/sec     1.17  849.8±109.51µs        ? B/sec
   sort nulls 2^12                               1.00   631.5±29.67µs        ? 
B/sec     1.18   746.5±87.64µs        ? B/sec
   struct_array_from_vec 1024                    1.20     17.3±4.32µs        ? 
B/sec     1.00     14.4±0.93µs        ? B/sec
   subtract 512                                  1.00  487.5±173.36ns        ? 
B/sec     1.18   574.3±37.34ns        ? B/sec
   sum nulls 512                                 1.11   363.1±73.55ns        ? 
B/sec     1.00   327.8±97.10ns        ? B/sec
   take bool 512                                 1.00      2.2±0.28µs        ? 
B/sec     1.22      2.6±0.32µs        ? B/sec
   take bool nulls 1024                          1.00      4.0±0.25µs        ? 
B/sec     2.04      8.1±2.08µs        ? B/sec
   take bool nulls 512                           1.00      2.2±0.13µs        ? 
B/sec     1.80      3.9±0.77µs        ? B/sec
   take i32 1024                                 1.00  1507.3±81.31ns        ? 
B/sec     1.57      2.4±0.58µs        ? B/sec
   take i32 512                                  1.00  1030.9±39.48ns        ? 
B/sec     1.32  1364.4±97.02ns        ? B/sec
   take i32 nulls 1024                           1.00  1538.9±51.37ns        ? 
B/sec     1.81      2.8±1.00µs        ? B/sec
   take i32 nulls 512                            1.00  1023.8±51.95ns        ? 
B/sec     2.48      2.5±1.64µs        ? B/sec
   take str 512                                  1.00      4.1±0.51µs        ? 
B/sec     1.14      4.6±0.96µs        ? B/sec
   take str null indices 1024                    1.00      6.1±0.40µs        ? 
B/sec     1.18      7.2±1.63µs        ? B/sec
   take str null indices 512                     1.00      3.8±0.15µs        ? 
B/sec     1.21      4.6±1.32µs        ? B/sec
   take str null values 1024                     1.00      6.0±0.40µs        ? 
B/sec     1.18      7.1±1.02µs        ? B/sec
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to