alamb commented on PR #17105:
URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3174858830

   > > 🤔 the new kernel seems to slow down. I wonder if the overhead of 
precisely sized output batches is causing the issue
   > 
   > Good point @alamb , i agree this is the only difference. I can add a test 
PR to make upstream do not generate precisely sized output batches, but when we 
ensure capacity for the increment buffer size, it seems we need to make the 
size change since we do not keep the same target size for this change.
   > 
   > The latest benchmark seems a little better.
   > > ```
   
   Thanks @zhuqi-lucas  -- what I was thinking about was something like the 
following
   
   ```rust
   let target_batch_size = 4;
   let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
     .with_exact_size(false)
   ```
   
   Before we spend a lot of time polishing / testing a PR for that it would 
probably be good to hack up a  POC and verify it actually improves performance
   
   Thank you for your willingness to help along with this project. It is 
something I have thought was important (but not critical) for a long time and 
so having someone to help really makes a big difference
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to