ctsk commented on PR #15479:
URL: https://github.com/apache/datafusion/pull/15479#issuecomment-2817209539

   
[benchmarks/hashagg-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/hashagg-results.md)
   
[benchmarks/join-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/join-results.md)
   
[benchmarks/sort-results.md](https://github.com/apache/datafusion/blob/05db68ca2e5efba9b76759178f50a0595ffd9939/benchmarks/sort-results.md)
   
   I've checked in the results because I think they would be too large to 
include as a comment.
   
   Each file contains the results of reducing the coalesce threshold for a 
single operator - joins, hash aggregations, and sorts. Coalescing before all 
other operators remains unchanged. The value behind each configuration 
describes what the coalesce threshold was set to: SORT0 means that 
CoalesceBatches operators were fully removed, whereas SORT256 means that the 
CoalesceBatches operator in front of a SORT was configured to emit a batch once 
it had 256 rows buffered. The same applies to joins and hash aggregations.
   The CHANGE value represents the relative change of the column to its right 
to the base column (the baseline when this PR branched off main).
   
   The benchmarks were run with 16 target partitions. I suspect that the more 
target partitions there are, the smaller the batches produced by 
RepartitionExec become. Therefore, removing coalesce might work better with 
smaller target partition counts (for hash aggregation and joins).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to