andygrove opened a new pull request, #3235: URL: https://github.com/apache/datafusion-comet/pull/3235
## Summary - Fix `CometBroadcastExchangeExec` to properly use AQE's coalesced shuffle partition specs - When `AQEShuffleReadExec` wraps a `CometShuffleExchangeExec`, use `getShuffleRDD(partitionSpecs)` to get the coalesced RDD instead of bypassing AQE ## Problem When AQE (Adaptive Query Execution) coalesces shuffle partitions based on runtime statistics, `CometBroadcastExchangeExec` was bypassing this optimization by reading directly from the inner shuffle plan. This caused broadcast exchanges to spawn many tasks (e.g., 200) even when AQE determined that only 1 task was needed due to small data volume after filtering. For example, in TPC-H Q18, after the filter `sum(l_quantity) > 313`, there are only ~900 rows to broadcast. AQE correctly identifies this and coalesces 200 shuffle partitions into 1. However, Comet's broadcast exchange was ignoring this and still reading from all 200 original partitions. ## Solution Added proper handling for `AQEShuffleReadExec` wrapping `CometShuffleExchangeExec`: 1. Extract partition specs from `AQEShuffleReadExec` 2. Use `CometShuffleExchangeExec.getShuffleRDD(partitionSpecs)` to get the coalesced RDD 3. Serialize batches from the coalesced RDD for broadcasting ## Test plan - [x] Existing unit tests pass (`CometExecSuite`, `CometShuffleSuite`) - [x] Verified with TPC-H Q18: broadcast stages now use 1 task instead of 200 when AQE coalesces partitions 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
