Dandandan opened a new pull request #9048: URL: https://github.com/apache/arrow/pull/9048
This applies some refactoring to `build_batch_from_indices` which is supposed to make further changes easier, e.g. solving https://issues.apache.org/jira/browse/ARROW-11030 * This starts handling right (1) batch and left (many) batches differently as for the right batches we can directly use `take` on it. This should be more efficient anyway, and also allows in the future to build the index array directly instead of doing extra copying. * Use indices.len() for the capacity parameter, rather than the number of rows at the left. This is of impact at larger sizes (e.g. SF 100), see: https://github.com/apache/arrow/pull/9036 Rather than estimating it on previous batches, this does it based on the (known) number of resulting rows. * The refactoring makes it easier to apply changes needed for https://issues.apache.org/jira/browse/ARROW-11030 where we need to remove the n*n work that is done for the build side FYI @jorgecarleitao @andygrove ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
