Re: [PR] feat(aqe): SplitPartitionsRule — file-list sharding for skewed shuffle partitions (v1) [datafusion-ballista]

via GitHub Tue, 26 May 2026 00:07:34 -0700


wirybeaver commented on PR #1718:
URL: 
https://github.com/apache/datafusion-ballista/pull/1718#issuecomment-4541460753


   Closing in favor of porting Spark's `OptimizeSkewedJoin` AQE rule. File-list 
sharding bails on any hash/single-partition consumer (joins, FinalPartitioned 
aggregates, global limits), which covers most realistic workloads — including 
the TPC-H Q2 case this PR aimed to address. Spark's split+replicate pattern 
handles the highest-value skew shape (skewed joins) directly and is correct by 
construction. New work will land on branch `optimize-skewed-join`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(aqe): SplitPartitionsRule — file-list sharding for skewed shuffle partitions (v1) [datafusion-ballista]

Reply via email to