[ https://issues.apache.org/jira/browse/SPARK-42503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-42503: -------------------------------- Target Version/s: (was: 3.4.0) > Spark SQL should do further validation on join condition fields > --------------------------------------------------------------- > > Key: SPARK-42503 > URL: https://issues.apache.org/jira/browse/SPARK-42503 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.2 > Reporter: zzzzming95 > Priority: Major > Fix For: 3.4.0 > > > In Spark SQL, the conditions for the join use fields that are allowed to be > fields from a non-left table or a non-right table. In this case, the join > will degenerate into a cross join. > Suppose you have two tables, test1 and test2, which have the same table > schema: > {code:java} > ``` > CREATE TABLE `default`.`test1` ( > `id` INT, > `name` STRING, > `age` INT, > `dt` STRING) > USING parquet > PARTITIONED BY (dt) > ```{code} > The following SQL has three joins, but in the last left join, the conditions > is `t1.name=t2.name`, and t3.name is not used. So the last left join will be > cross join. > {code:java} > ``` > select * > from > (select * from test1 where dt="20230215" and age=1 ) t1 > left join > (select * from test1 where dt=="20230215" and age=2) t2 > on t1.name=t2.name > left join > (select * from test2 where dt="20230215") t3 > on > t1.name=t2.name; > ```{code} > So i think Spark SQL should do further validation on join condition, the > fields of join condition must be a left table or right table field , > otherwise it is thrown `AnalysisException`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org