[
https://issues.apache.org/jira/browse/SPARK-42503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuming Wang updated SPARK-42503:
--------------------------------
Fix Version/s: (was: 3.4.0)
> Spark SQL should do further validation on join condition fields
> ---------------------------------------------------------------
>
> Key: SPARK-42503
> URL: https://issues.apache.org/jira/browse/SPARK-42503
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.2
> Reporter: zzzzming95
> Priority: Major
>
> In Spark SQL, the conditions for the join use fields that are allowed to be
> fields from a non-left table or a non-right table. In this case, the join
> will degenerate into a cross join.
> Suppose you have two tables, test1 and test2, which have the same table
> schema:
> {code:java}
> ```
> CREATE TABLE `default`.`test1` (
> `id` INT,
> `name` STRING,
> `age` INT,
> `dt` STRING)
> USING parquet
> PARTITIONED BY (dt)
> ```{code}
> The following SQL has three joins, but in the last left join, the conditions
> is `t1.name=t2.name`, and t3.name is not used. So the last left join will be
> cross join.
> {code:java}
> ```
> select *
> from
> (select * from test1 where dt="20230215" and age=1 ) t1
> left join
> (select * from test1 where dt=="20230215" and age=2) t2
> on t1.name=t2.name
> left join
> (select * from test2 where dt="20230215") t3
> on
> t1.name=t2.name;
> ```{code}
> So i think Spark SQL should do further validation on join condition, the
> fields of join condition must be a left table or right table field ,
> otherwise it is thrown `AnalysisException`.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]