zzzzming95 created SPARK-42503:
----------------------------------
Summary: Spark SQL should do further validation on join condition
fields
Key: SPARK-42503
URL: https://issues.apache.org/jira/browse/SPARK-42503
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.2
Reporter: zzzzming95
Fix For: 3.4.0
In Spark SQL, the conditions for the join use fields that are allowed to be
fields from a non-left table or a non-right table. In this case, the join will
degenerate into a cross join.
Suppose you have two tables, test1 and test2, which have the same table schema:
```
CREATE TABLE `default`.`test1` (
`id` INT,
`name` STRING,
`age` INT,
`dt` STRING)
USING parquet
PARTITIONED BY (dt)
```
The following SQL has three joins, but in the last left join, the conditions is
`t1.name=t2.name`, and t3.name is not used. So the last left join will be cross
join.
```
select *
from
(select * from test1 where dt="20230215" and age=1 ) t1
left join
(select * from test1 where dt=="20230215" and age=2) t2
on t1.name=t2.name
left join
(select * from test2 where dt="20230215") t3
on
t1.name=t2.name;
```
So i think Spark SQL should do further validation on join condition, the fields
of join condition must be a left table or right table field , otherwise it is
thrown `AnalysisException`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]