alamb commented on issue #9370:
URL:
https://github.com/apache/arrow-datafusion/issues/9370#issuecomment-1967773634
> Hello @alamb, I'd love to work on this
Thanks @edmondop -- that would be great. I think this could get quite
tricky if we are not careful so I would suggest taking it in phases.
Perhaps you can first try to remove `CoalesceBatchesExec` by refactoring its
code into a struct like
```rust
struct BatchCoalscer {
batches: Vec<RecordBatch>
target_batch_size: usize,
}
impl BatchCoalscer {
/// Buffers the specified record batch. If a more than `target_batch_size`
rows are buffered,
/// clears the buffer and emits a RecordBatch with target_batch_size rows
fn push(&mut self, batch: RecordBatch) -> Option<RecordBatch>{ .. }
/// Completes this coalscer and emits any buffered rows
fn finish(mut self) -> Option<RecordBatch> { ... }
}
```
And then using that struct directly in `RepartitionExec` and any other
places that require CoalsceExec
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]