zhuqi-lucas commented on PR #17105:
URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3174912170

   > > > 🤔 the new kernel seems to slow down. I wonder if the overhead of 
precisely sized output batches is causing the issue
   > > 
   > > 
   > > Good point @alamb , i agree this is the only difference. I can add a 
test PR to make upstream do not generate precisely sized output batches, but 
when we ensure capacity for the increment buffer size, it seems we need to make 
the size change since we do not keep the same target size for this change.
   > > The latest benchmark seems a little better.
   > > >
   > 
   > Thanks @zhuqi-lucas -- what I was thinking about was something like the 
following
   > 
   > ```rust
   > let target_batch_size = 4;
   > let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
   >   .with_exact_size(false)
   > ```
   > 
   > Before we spend a lot of time polishing / testing a PR for that it would 
probably be good to hack up a POC and verify it actually improves performance
   > 
   > Thank you for your willingness to help along with this project. It is 
something I have thought was important (but not critical) for a long time and 
so having someone to help really makes a big difference
   
   
   Thank you @alamb for good suggestion! It looks pretty cool to me, and a 
config for this is very clever idea.
   ```rust
   let target_batch_size = 4;
   let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
     .with_exact_size(false)
   ```
   
   I will try to address this for upstream first, so we can easily testing it 
for datafusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to