alamb commented on issue #9370:
URL: 
https://github.com/apache/arrow-datafusion/issues/9370#issuecomment-1967773634

   > Hello @alamb, I'd love to work on this
   
   Thanks @edmondop  -- that would be great. I think this could get quite 
tricky if we are not careful so I would suggest taking it in phases. 
   
   Perhaps you can first try to remove `CoalesceBatchesExec` by refactoring its 
code into a struct like 
   ```rust
   struct BatchCoalscer {
     batches: Vec<RecordBatch> 
     target_batch_size: usize,
   }
   
   impl BatchCoalscer { 
     /// Buffers the specified record batch. If a more than `target_batch_size` 
rows are buffered, 
     /// clears the buffer and emits a RecordBatch with target_batch_size rows
    fn push(&mut self, batch: RecordBatch) -> Option<RecordBatch>{ .. }
   
    /// Completes this coalscer and emits any buffered rows
    fn finish(mut self) -> Option<RecordBatch> { ... }
   }
   ```
   
   And then using that struct directly in `RepartitionExec` and any other 
places that require CoalsceExec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to