wirybeaver commented on PR #1718:
URL: 
https://github.com/apache/datafusion-ballista/pull/1718#issuecomment-4541460753

   Closing in favor of porting Spark's `OptimizeSkewedJoin` AQE rule. File-list 
sharding bails on any hash/single-partition consumer (joins, FinalPartitioned 
aggregates, global limits), which covers most realistic workloads — including 
the TPC-H Q2 case this PR aimed to address. Spark's split+replicate pattern 
handles the highest-value skew shape (skewed joins) directly and is correct by 
construction. New work will land on branch `optimize-skewed-join`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to