zml1206 commented on PR #6048:
URL: 
https://github.com/apache/incubator-gluten/pull/6048#issuecomment-2164300493

   > > However, because the plan has not changed, the plan will not be replaced
   > 
   > it's not sure, AQE will use the new plan if [currentPhysicalPlan != 
newPhysicalPlan](https://github.com/apache/spark/blob/fd045c9887feabc37c0f15fa41c860847f5fffa0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L381-L389).
 So I think ShuffledHashJoin should always choose the small table. 
SortMergeJoin is a special case that Spark always build right side for inner 
join, etc.
   > 
   > If your goal is to optimize the vanilla Spark SortMergeJoin. I think it's 
better to push to Spark community first. For gluten, we can just optimize 
SortMergeJoin when do transform.
   
   First of all, it is primary to ensure that the plan after fallback is 
consistent with vanilla spark, so we should not force the generation of 
shuffledHashJoinExec. We should convert ShuffledHashJoinExec/SortMergeJoinExec 
into ShuffledHashJoinExecTransformer in offload.
   Secondly, vanilla spark supports using smaller table as buildSide starting 
from version 3.5. It is not supported before 3.5, so ShuffledHashJoinExec 
before 3.5 and SortMergeJoinExec  cannot necessarily choose smaller table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to