zhuqi-lucas commented on code in PR #8146:
URL: https://github.com/apache/arrow-rs/pull/8146#discussion_r2280471539


##########
arrow-select/src/coalesce.rs:
##########
@@ -236,6 +262,13 @@ impl BatchCoalescer {
     /// assert_eq!(completed_batch, expected_batch);
     /// ```
     pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError> 
{
+        if let Some(limit) = self.biggest_coalesce_batch_size {

Review Comment:
   > I mean if there are already smaller batches in the in progress array and 
the condition holds, the array will be created and in progress batches will be 
copied, creating two batches.
   It would be worth testing I think only having the short path if the 
condition holds and there are no (smaller) in progress batches, so there is 
only one output batch being generated.
   
   Addressed above comments in latest PR.
   
   > Also, thinking about it, we can as well prevent flushing in progress array 
if we do not require the output to be sorted (so smaller in progress batches 
will still be combined into a output batch of the target size).
   
   Also added todo for this point, i can create a follow-up ticket after this 
PR, thanks!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to