ctsk commented on issue #6692: URL: https://github.com/apache/arrow-rs/issues/6692#issuecomment-2748139281
> Does that mean you have an implementation that you could potentially share / contribute? I've opened draft PRs for the changes in arrow (https://github.com/apache/arrow-rs/pull/7325) and in datafusion (https://github.com/apache/datafusion/pull/15392). In arrow, I added a "take_in" kernel, that takes an array, indices and a builder, then appends the elements of `array` at indices `indices` to the given builder. I also added a `RecordBatchBuilder` that holds a collection of `ArrayBuilders` for convenience (Similar to how a RecordBatch holds a collection of Arrays). In datafusion, I tried modifying the RepartitionExec to use this API. This meant 1. Move the `coalesce` step closer to the `take` step: Currently, datafusion only coalesces after distributing the batch to the destination partition, to use this kind of API, we need to coalesce in each producing thread before distributing 2. Replace the take+coalesce combo with a `RecordBatchBuilder` I suspect this currently fails to build due to the `chrono` issue... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
