askoa opened a new issue, #3701:
URL: https://github.com/apache/arrow-rs/issues/3701

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The current implementation of `take_run` only handles `PrimitiveArray`.  
Also, it's slow as it compares the values. Extending the current approach to 
String and Binary values will make the solution much slower.
   
   **Describe the solution you'd like**
   Instead of run encoding taken values, we can run encode taken physical 
indices. This will be significantly faster for String and Binary values as we 
will avoid comparing values. The drawback of this approach is that in certain 
scenarios the output might not be efficiently run encoded. For e.g. given a 
`RunArray { run_ends=[2,4,6,8], values=[1,2,1,2] }` and take indices 
`[2,3,6,7]`, the output will be `RunArray { run_ends=[2,4], values=[2,2] }` 
rather than `RunArray { run_ends=[4], values=[2] }`
   
   **Describe alternatives you've considered**
   We continue with the current approach of comparing values which, in creation 
scenarios, will result in efficient run encoded array at the cost of 
performance.
   
   **Additional context**
   https://github.com/apache/arrow-rs/pull/3622#discussion_r1089826535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to