askoa opened a new issue, #3701:
URL: https://github.com/apache/arrow-rs/issues/3701
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
The current implementation of `take_run` only handles `PrimitiveArray`.
Also, it's slow as it compares the values. Extending the current approach to
String and Binary values will make the solution much slower.
**Describe the solution you'd like**
Instead of run encoding taken values, we can run encode taken physical
indices. This will be significantly faster for String and Binary values as we
will avoid comparing values. The drawback of this approach is that in certain
scenarios the output might not be efficiently run encoded. For e.g. given a
`RunArray { run_ends=[2,4,6,8], values=[1,2,1,2] }` and take indices
`[2,3,6,7]`, the output will be `RunArray { run_ends=[2,4], values=[2,2] }`
rather than `RunArray { run_ends=[4], values=[2] }`
**Describe alternatives you've considered**
We continue with the current approach of comparing values which, in creation
scenarios, will result in efficient run encoded array at the cost of
performance.
**Additional context**
https://github.com/apache/arrow-rs/pull/3622#discussion_r1089826535
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]