jorgecarleitao opened a new pull request #8169:
URL: https://github.com/apache/arrow/pull/8169
This PR aims to improve the speed of the `take` kernel by using ~the dark
magic~ buffers that @nevi-me thought me in another PR.
However, the current tests fail, and wanted to know your opinions (e.g.
@andygrove , @nevi-me , @paddyhoran , @alamb ) on this. The reason the tests
fail is that the `take` operation is not uniquely defined. Specifically:
1. should the null bitmap have the same length of the final array?
2. the values when indexes null are unspecificed according to the spec, and
different implementations of take can lead to different results.
```
git checkout 30143fc493 && cargo bench --bench take_kernels && git checkout
take_faster && cargo bench --bench take_kernels
```
Result:
```
take i32 512 time: [2.9221 us 2.9282 us 2.9348 us]
change: [-46.697% -45.799% -44.703%] (p = 0.00 <
0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
take i32 1024 time: [5.0237 us 5.0376 us 5.0548 us]
change: [-48.840% -48.433% -48.087%] (p = 0.00 <
0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
take i32 1024 #2 time: [4.8138 us 4.8255 us 4.8389 us]
change: [-50.574% -50.023% -49.285%] (p = 0.00 <
0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe
take bool 512 time: [2.4694 us 2.4765 us 2.4843 us]
change: [-50.789% -50.390% -50.012%] (p = 0.00 <
0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
take bool 1024 time: [4.0698 us 4.2407 us 4.4884 us]
change: [-51.026% -49.906% -48.535%] (p = 0.00 <
0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
take str 512 time: [8.1593 us 8.9810 us 10.114 us]
change: [-66.908% -57.395% -45.151%] (p = 0.00 <
0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
9 (9.00%) high mild
1 (1.00%) high severe
take str 1024 time: [12.098 us 12.151 us 12.208 us]
change: [-78.241% -75.656% -72.725%] (p = 0.00 <
0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]