jorgecarleitao opened a new pull request #8853:
URL: https://github.com/apache/arrow/pull/8853


   This PR:
   
   * extends the types that `concat` support for all types that 
`MutableArrayData` supports (i.e. it now supports nested Lists, all primitives, 
boolean, string and large string, etc.)
   * makes `concat` 6x faster for primitive types and 2x faster for string 
types (and likely also for the other types)
   * changes `concat`'s signature to `&[&Array]` instead of `&[Arc<Array>]`, to 
avoid an `Arc::clone`.
   
   Since `XBuilder::append_data` was specifically built for this kernel but is 
not used, and `MutableArrayData` offers a more generic API for it, this PR 
removes that code.
   
   The overall principle for this removal is that `Builder` is the API to build 
an arrow array from elements or slices of rust native types, while the 
`MutableArrayData` (for a lack of a better name) is suited to build an arrow 
array from an existing set of arrow arrays. In the case of `concat`, this 
corresponds to mem-copies of the individual arrays (taking into account nulls 
and all that stuff) in sequence.
   
   Based on this principle, `Builder` does not need to know how to build an 
array from existing arrays (the `append_data`).
   
   I would like to migrate all the tests for the `XBuilder::append_data` to the 
`MutableArrayData`, to not lose them, but for that #8850 #8852 #8851 and #8849 
and #8848 needs to land first (thus being a draft).
   
   Benchmarks:
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | concat str 1024 | -45.3 | 
   | concat str nulls 1024 | -61.1 | 
   | concat i32 1024 | -83.5 | 
   | concat i32 nulls 1024 | -86.1 |
   
   ```
   git checkout 66468daf0b3ac3ef08b7c99c690e7b845f23ad2b
   cargo bench --bench concatenate_kernel
   git checkout concat
   cargo bench --bench concatenate_kernel
   ```
   
   ```
   Previous HEAD position was 66468daf0 Added concatenate bench
   Switched to branch 'concat'
      Compiling arrow v3.0.0-SNAPSHOT 
(/Users/jorgecarleitao/projects/arrow/rust/arrow)
       Finished bench [optimized] target(s) in 58.72s
        Running 
/Users/jorgecarleitao/projects/arrow/rust/target/release/deps/concatenate_kernel-94b8f5621cd4f767
   Gnuplot not found, using plotters backend
   concat i32 1024         time:   [4.2852 us 4.2912 us 4.2973 us]              
               
                           change: [-83.690% -83.469% -83.188%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low severe
     4 (4.00%) low mild
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   concat i32 nulls 1024   time:   [4.8617 us 4.8820 us 4.9080 us]              
                     
                           change: [-86.335% -86.101% -85.813%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   concat str 1024         time:   [19.472 us 19.527 us 19.593 us]              
               
                           change: [-46.212% -45.314% -44.341%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) low mild
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   concat str nulls 1024   time:   [39.447 us 39.525 us 39.613 us]              
                     
                           change: [-61.858% -61.091% -60.311%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     3 (3.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to