Dandandan opened a new pull request #9048:
URL: https://github.com/apache/arrow/pull/9048


   This applies some refactoring to `build_batch_from_indices` which is 
supposed to make further changes easier, e.g. solving 
https://issues.apache.org/jira/browse/ARROW-11030
   
   * This starts handling right (1) batch and left (many) batches differently 
as for the right batches we can directly use `take` on it. This should be more 
efficient anyway, and also allows in the future to build the index array 
directly instead of doing extra copying.
   * Use indices.len() for the capacity parameter, rather than the number of 
rows at the left. This is of impact at larger sizes (e.g. SF 100), see: 
https://github.com/apache/arrow/pull/9036 Rather than estimating it on previous 
batches, this does it based on the (known) number of resulting rows.
   * The refactoring makes it easier to apply changes needed for 
https://issues.apache.org/jira/browse/ARROW-11030 where we need to remove the 
n*n work that is done for the build side 
   
   FYI @jorgecarleitao @andygrove 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to