[ 
https://issues.apache.org/jira/browse/SPARK-53264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asif updated SPARK-53264:
-------------------------
    Description: 
In the RewriteCorrelatedScalarSubquery, when a correlated scalal subquery gets 
converted into a Left Outer Join, the Project node just above the Left Outer 
Join Node, has the nullability as false, of the attribute coming out of the 
right side of the Join Table.
Any attribute coming out of the right side of the Join Table for Left Outer 
Join should have nullability as false.

This results in the query ( from SQLQueryTestSuite) :
{quote}select *
{quote}
{quote}from range(1, 3) t1
{quote}
{quote}where (select t2.id c
{quote}
{quote}from range (1, 2) t2 where t1.id = t2.id
{quote}
{quote}) is not null
{quote}
eventually wrongly getting optimized into an Inner Join.

But the bug remains hidden in the current code base, due to the inefficiency in 
the PushDownPredicates rule, which indirectly sorts of hide the issue.

If the PushDownPredicates was working efficiently ( i.e combining and pushing 
predicates in a single pass), the bug would get exposed.

The inefficiency in PushDownPredicates rule is itself described in bug

[SPARK-36786|[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-36786]

 

Will be submitting a PR with bug test in some time.

  was:
In the RewriteCorrelatedScalarSubquery, when a correlated scalal subquery gets 
converted into a Left Outer Join, the Project node just above the Left Outer 
Join Node, has the nullability as false, of the attribute coming out of the 
right side of the Join Table.
Any attribute coming out of the right side of the Join Table for Left Outer 
Join should have nullability as false.

This results in the query ( from SQLQueryTestSuite) :
{quote}select *
{quote}
{quote}from range(1, 3) t1
{quote}
{quote}where (select t2.id c
{quote}
{quote}from range (1, 2) t2 where t1.id = t2.id
{quote}
{quote}) is not null
{quote}
eventually wrongly getting optimized into an Inner Join.

But the bug remains hidden in the current code base, due to the inefficiency in 
the PushDownPredicates rule, which indirectly sorts of hide the issue.

If the PushDownPredicates was working efficiently ( i.e combining and pushing 
predicates in a single pass), the bug would get exposed.

The inefficiency in PushDownPredicates rule is itself described in bug

[SPARK-36786|[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-36786].]

 

Will be submitting a PR with bug test in some time.


> Conversion of correlated scala subquery to Left Outer Join , results in 
> nullability as false, of the right side attribute
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53264
>                 URL: https://issues.apache.org/jira/browse/SPARK-53264
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.1.0, 4.0.0
>            Reporter: Asif
>            Priority: Major
>
> In the RewriteCorrelatedScalarSubquery, when a correlated scalal subquery 
> gets converted into a Left Outer Join, the Project node just above the Left 
> Outer Join Node, has the nullability as false, of the attribute coming out of 
> the right side of the Join Table.
> Any attribute coming out of the right side of the Join Table for Left Outer 
> Join should have nullability as false.
> This results in the query ( from SQLQueryTestSuite) :
> {quote}select *
> {quote}
> {quote}from range(1, 3) t1
> {quote}
> {quote}where (select t2.id c
> {quote}
> {quote}from range (1, 2) t2 where t1.id = t2.id
> {quote}
> {quote}) is not null
> {quote}
> eventually wrongly getting optimized into an Inner Join.
> But the bug remains hidden in the current code base, due to the inefficiency 
> in the PushDownPredicates rule, which indirectly sorts of hide the issue.
> If the PushDownPredicates was working efficiently ( i.e combining and pushing 
> predicates in a single pass), the bug would get exposed.
> The inefficiency in PushDownPredicates rule is itself described in bug
> [SPARK-36786|[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-36786]
>  
> Will be submitting a PR with bug test in some time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to