ctsk commented on issue #6692:
URL: https://github.com/apache/arrow-rs/issues/6692#issuecomment-2748139281

   > Does that mean you have an implementation that you could potentially share 
/ contribute?
   
   I've opened draft PRs for the changes in arrow 
(https://github.com/apache/arrow-rs/pull/7325) and in datafusion 
(https://github.com/apache/datafusion/pull/15392).
   
   In arrow, I added a "take_in" kernel, that takes an array, indices and a 
builder, then appends the elements of `array` at indices `indices` to the given 
builder. I also added a `RecordBatchBuilder` that holds a collection of 
`ArrayBuilders` for convenience (Similar to how a RecordBatch holds a 
collection of Arrays).
   
   In datafusion, I tried modifying the RepartitionExec to use this API. This 
meant
    1. Move the `coalesce` step closer to the `take` step: Currently, 
datafusion only coalesces after distributing the batch to the destination 
partition, to use this kind of API, we need to coalesce in each producing 
thread before distributing
    2. Replace the take+coalesce combo with a `RecordBatchBuilder`
   
   I suspect this currently fails to build due to the `chrono` issue...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to