mbutrovich commented on PR #3703: URL: https://github.com/apache/datafusion-comet/pull/3703#issuecomment-4070359121
@andygrove helped me out and ran TPC-H SF1000. We saw the biggest wins from TPC-H Q2, Q18, and Q20 where operators emitted many small batches (likely joins). Why those weren't being coalesced within a partition, I am not sure right now. However, the wins were huge in some of these degenerate cases. For example, current behavior on `main` branch, highlighting one `CometBroadcastHashJoin` in TPC-H Q2: <img width="661" height="687" alt="Screenshot 2026-03-16 at 4 12 03 PM" src="https://github.com/user-attachments/assets/8afeee9b-9abd-4d6c-90ef-eddb19e8366d" /> Compare to PR #3703: <img width="718" height="659" alt="Screenshot 2026-03-16 at 4 12 14 PM" src="https://github.com/user-attachments/assets/3fe14a92-18da-4707-a968-5048abc3f1b3" /> I'll look into we're missing a coalescing opportunity at the output of the previous stage, since I'm a bit surprised at the behavior here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
