ctsk commented on PR #15479: URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817209539
[benchmarks/hashagg-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/hashagg-results.md) [benchmarks/join-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/join-results.md) [benchmarks/sort-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/sort-results.md) I've checked in the results because I think they would be too large to include as a comment. Each file contains the results of reducing the coalesce threshold for a single operator - joins, hash aggregations, and sorts. Coalescing before all other operators remains unchanged. The value behind each configuration describes what the coalesce threshold was set to: SORT0 means that CoalesceBatches operators were fully removed, whereas SORT256 means that the CoalesceBatches operator in front of a SORT was configured to emit a batch once it had 256 rows buffered. The same applies to joins and hash aggregations. The CHANGE value represents the relative change of the column to its right to the base column (the baseline when this PR branched off main). The benchmarks were run with 16 target partitions. I suspect that the more target partitions there are, the smaller the batches produced by RepartitionExec become. Therefore, removing coalesce might work better with smaller target partition counts (for hash aggregation and joins). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org