alamb commented on PR #17105:
URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3174858830
> > 🤔 the new kernel seems to slow down. I wonder if the overhead of
precisely sized output batches is causing the issue
>
> Good point @alamb , i agree this is the only difference. I can add a test
PR to make upstream do not generate precisely sized output batches, but when we
ensure capacity for the increment buffer size, it seems we need to make the
size change since we do not keep the same target size for this change.
>
> The latest benchmark seems a little better.
> > ```
Thanks @zhuqi-lucas -- what I was thinking about was something like the
following
```rust
let target_batch_size = 4;
let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
.with_exact_size(false)
```
Before we spend a lot of time polishing / testing a PR for that it would
probably be good to hack up a POC and verify it actually improves performance
Thank you for your willingness to help along with this project. It is
something I have thought was important (but not critical) for a long time and
so having someone to help really makes a big difference
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]