[ 
https://issues.apache.org/jira/browse/SPARK-28220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303057#comment-17303057
 ] 

Apache Spark commented on SPARK-28220:
--------------------------------------

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31857

> join foldable condition not pushed down when parent filter is totally pushed 
> down
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-28220
>                 URL: https://issues.apache.org/jira/browse/SPARK-28220
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2, 3.0.0
>            Reporter: liupengcheng
>            Priority: Major
>
> We encountered a issue that join conditions not pushed down when we are 
> running spark app on spark2.3, after carefully looking into the code and 
> debugging, we found that it's because there is a bug in the rule 
> `PushPredicateThroughJoin`:
> It will try to push parent filter down though the join, however, when the 
> parent filter is wholly pushed down through the join, the join will become 
> the top node, and then the `transform` method will skip the join to apply the 
> rule. 
>  
> Suppose we have two tables: table1 and table2:
> table1: (a: string, b: string, c: string)
> table2: (d: string)
> sql as:
>  
> {code:java}
> select * from table1 left join (select d, 'w1' as r from table2) on a = d and 
> r = 'w2' where b = 2{code}
>  
> let's focus on the following optimizer rules:
> PushPredicateThroughJoin
> FodablePropagation
> BooleanSimplification
> PruneFilters
>  
> In the above case, on the first iteration of these rules:
> PushPredicateThroughJoin -> 
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and r = 'w2'
> {code}
> FodablePropagation ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> a = d and 'w1' = 'w2'{code}
> BooleanSimplification ->
> {code:java}
> select * from table1 where b=2 left join (select d, 'w1' as r from table2) on 
> false{code}
> PruneFilters -> No effective
>  
> After several iteration of these rules, the join condition will still never 
> be pushed to the 
> right hand of the left join. thus, in some case(e.g. Large right table), the 
> `BroadcastNestedLoopJoin` may be slow or oom.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to