[
https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407823#comment-17407823
]
Cheng Su commented on SPARK-36612:
----------------------------------
I agree some queries do fit in this scenario. We can save the sort before join
for these queries if we are able to do shuffled hash join on it, instead of
sort merge join.
I don't think it solves the AQE skew problem though. We still cannot split the
skewed partition from the right side of LEFT OUTER join, because across
multiple tasks, they don't have common knowledge of which rows are matched or
not during runtime.
> Support left outer join build left or right outer join build right in
> shuffled hash join
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-36612
> URL: https://issues.apache.org/jira/browse/SPARK-36612
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: mcdull_zhang
> Priority: Major
>
> Currently spark sql does not support build left side when left outer join (or
> build right side when right outer join).
> However, in our production environment, there are a large number of scenarios
> where small tables are left join large tables, and many times, large tables
> have data skew (currently AQE can't handle this kind of skew).
> Inspired by SPARK-32399, we can use similar ideas to realize left outer join
> build left.
> I think this treatment is very meaningful, but I don’t know how members
> consider this matter?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]