alamb commented on code in PR #8146:
URL: https://github.com/apache/arrow-rs/pull/8146#discussion_r2283296290


##########
arrow-select/src/coalesce.rs:
##########
@@ -236,6 +262,13 @@ impl BatchCoalescer {
     /// assert_eq!(completed_batch, expected_batch);
     /// ```
     pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError> 
{
+        if let Some(limit) = self.biggest_coalesce_batch_size {

Review Comment:
   > > Also, thinking about it, we can as well prevent flushing in progress 
array if we do not require the output to be sorted (so smaller in progress 
batches will still be combined into a output batch of the target size).
   > 
   > Also added todo for this point, i can create a follow-up ticket after this 
PR, thanks!
   
   I think this is a wise idea. I suspect there many things in DataFusion that 
implicitly assume that rows are not reordered so we will have to test such a 
change carefully



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to