alamb commented on PR #17105: URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3174858830
> > 🤔 the new kernel seems to slow down. I wonder if the overhead of precisely sized output batches is causing the issue > > Good point @alamb , i agree this is the only difference. I can add a test PR to make upstream do not generate precisely sized output batches, but when we ensure capacity for the increment buffer size, it seems we need to make the size change since we do not keep the same target size for this change. > > The latest benchmark seems a little better. > > ``` Thanks @zhuqi-lucas -- what I was thinking about was something like the following ```rust let target_batch_size = 4; let mut coalescer = BatchCoalescer::new(batch1.schema(), 4) .with_exact_size(false) ``` Before we spend a lot of time polishing / testing a PR for that it would probably be good to hack up a POC and verify it actually improves performance Thank you for your willingness to help along with this project. It is something I have thought was important (but not critical) for a long time and so having someone to help really makes a big difference -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org