[jira] [Commented] (SPARK-21652) Optimizer cannot reach a fixed point on certain queries

Anton Okolnychyi (JIRA) Mon, 07 Aug 2017 00:29:49 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116192#comment-16116192
 ]


Anton Okolnychyi commented on SPARK-21652:
------------------------------------------

One option to fix this is NOT to apply ConstantPropagation to such predicates 
as '(col1 = col2)' if both sides can be replaced with a constant value.

> Optimizer cannot reach a fixed point on certain queries
> -------------------------------------------------------
>
>                 Key: SPARK-21652
>                 URL: https://issues.apache.org/jira/browse/SPARK-21652
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.2.0
>            Reporter: Anton Okolnychyi
>
> The optimizer cannot reach a fixed point on the following query:
> {code}
> Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1")
> Seq(1, 2).toDF("col").write.saveAsTable("t2")
> spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 
> = t2.col AND t1.col2 = t2.col").explain(true)
> {code}
> At some point during the optimization, InferFiltersFromConstraints infers a 
> new constraint '(col2#33 = col1#32)' that is appended to the join condition, 
> then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces 
> '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints, 
> ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally 
> removes this predicate. However, InferFiltersFromConstraints will again infer 
> '(col2#33 = col1#32)' on the next iteration and the process will continue 
> until the limit of iterations is reached. 
> See below for more details
> {noformat}
> === Applying Rule 
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
> !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                 Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && 
> (col2#33 = col#34)))
>  :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33)))   :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
> ((col1#32 = 1) && (1 = col2#33)))
>  :  +- Relation[col1#32,col2#33] parquet                                      
>                 :  +- Relation[col1#32,col2#33] parquet
>  +- Filter ((1 = col#34) && isnotnull(col#34))                                
>                 +- Filter ((1 = col#34) && isnotnull(col#34))
>     +- Relation[col#34] parquet                                               
>                    +- Relation[col#34] parquet
>                 
> === Applying Rule 
> org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
> !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = 
> col#34)))              Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33)))   :- Filter (col2#33 = col1#32)
> !:  +- Relation[col1#32,col2#33] parquet                                      
>                 :  +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
> ((col1#32 = 1) && (1 = col2#33)))
> !+- Filter ((1 = col#34) && isnotnull(col#34))                                
>                 :     +- Relation[col1#32,col2#33] parquet
> !   +- Relation[col#34] parquet                                               
>                 +- Filter ((1 = col#34) && isnotnull(col#34))
> !                                                                             
>                    +- Relation[col#34] parquet
>                 
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters ===
>  Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                    Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> !:- Filter (col2#33 = col1#32)                                                
>                    :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && 
> ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32))
> !:  +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) 
> && (1 = col2#33)))   :  +- Relation[col1#32,col2#33] parquet
> !:     +- Relation[col1#32,col2#33] parquet                                   
>                    +- Filter ((1 = col#34) && isnotnull(col#34))
> !+- Filter ((1 = col#34) && isnotnull(col#34))                                
>                       +- Relation[col#34] parquet
> !   +- Relation[col#34] parquet                                               
>                    
>                 
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 
> ===
>  Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                                          Join Inner, ((col1#32 = col#34) && 
> (col2#33 = col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33))) && (col2#33 = col1#32))   :- Filter (((isnotnull(col1#32) && 
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1))
>  :  +- Relation[col1#32,col2#33] parquet                                      
>                                          :  +- Relation[col1#32,col2#33] 
> parquet
>  +- Filter ((1 = col#34) && isnotnull(col#34))                                
>                                          +- Filter ((1 = col#34) && 
> isnotnull(col#34))
>     +- Relation[col#34] parquet                                               
>                                             +- Relation[col#34] parquet
>                 
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding ===
>  Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                              Join Inner, ((col1#32 = col#34) && (col2#33 = 
> col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33))) && (1 = 1))   :- Filter (((isnotnull(col1#32) && 
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true)
>  :  +- Relation[col1#32,col2#33] parquet                                      
>                              :  +- Relation[col1#32,col2#33] parquet
>  +- Filter ((1 = col#34) && isnotnull(col#34))                                
>                              +- Filter ((1 = col#34) && isnotnull(col#34))
>     +- Relation[col#34] parquet                                               
>                                 +- Relation[col#34] parquet
>                 
> === Applying Rule 
> org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
>  Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                           Join Inner, ((col1#32 = col#34) && (col2#33 = 
> col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33))) && true)   :- Filter ((isnotnull(col1#32) && 
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33)))
>  :  +- Relation[col1#32,col2#33] parquet                                      
>                           :  +- Relation[col1#32,col2#33] parquet
>  +- Filter ((1 = col#34) && isnotnull(col#34))                                
>                           +- Filter ((1 = col#34) && isnotnull(col#34))
>     +- Relation[col#34] parquet                                               
>                              +- Relation[col#34] parquet
>                 
> === Applying Rule 
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
> !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                       
>                 Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && 
> (col2#33 = col#34)))
>  :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
> (1 = col2#33)))   :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
> ((col1#32 = 1) && (1 = col2#33)))
>  :  +- Relation[col1#32,col2#33] parquet                                      
>                 :  +- Relation[col1#32,col2#33] parquet
>  +- Filter ((1 = col#34) && isnotnull(col#34))                                
>                 +- Filter ((1 = col#34) && isnotnull(col#34))
>     +- Relation[col#34] parquet                                               
>                    +- Relation[col#34] parquet
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21652) Optimizer cannot reach a fixed point on certain queries

Reply via email to