Anton Okolnychyi created SPARK-21652:
----------------------------------------

             Summary: Optimizer cannot reach a fixed point on certain queries
                 Key: SPARK-21652
                 URL: https://issues.apache.org/jira/browse/SPARK-21652
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.2.0
            Reporter: Anton Okolnychyi


The optimizer cannot reach a fixed point on the following query:

{code}
Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1")
Seq(1, 2).toDF("col").write.saveAsTable("t2")
spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 = 
t2.col AND t1.col2 = t2.col").explain(true)
{code}

At some point during the optimization, InferFiltersFromConstraints infers a new 
constraint '(col2#33 = col1#32)' that is appended to the join condition, then 
PushPredicateThroughJoin pushes it down, ConstantPropagation replaces '(col2#33 
= col1#32)' with '1 = 1' based on other propagated constraints, ConstantFolding 
replaces '1 = 1' with 'true and BooleanSimplification finally removes this 
predicate. However, InferFiltersFromConstraints will again infer '(col2#33 = 
col1#32)' on the next iteration and the process will continue until the limit 
of iterations is reached. 

See below for more details

{noformat}
=== Applying Rule 
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
!Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
              Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && 
(col2#33 = col#34)))
 :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33)))   :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
((col1#32 = 1) && (1 = col2#33)))
 :  +- Relation[col1#32,col2#33] parquet                                        
              :  +- Relation[col1#32,col2#33] parquet
 +- Filter ((1 = col#34) && isnotnull(col#34))                                  
              +- Filter ((1 = col#34) && isnotnull(col#34))
    +- Relation[col#34] parquet                                                 
                 +- Relation[col#34] parquet
                

=== Applying Rule 
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
!Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = 
col#34)))              Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33)))   :- Filter (col2#33 = col1#32)
!:  +- Relation[col1#32,col2#33] parquet                                        
              :  +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
((col1#32 = 1) && (1 = col2#33)))
!+- Filter ((1 = col#34) && isnotnull(col#34))                                  
              :     +- Relation[col1#32,col2#33] parquet
!   +- Relation[col#34] parquet                                                 
              +- Filter ((1 = col#34) && isnotnull(col#34))
!                                                                               
                 +- Relation[col#34] parquet
                

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters ===
 Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
                 Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter (col2#33 = col1#32)                                                  
                 :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && 
((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32))
!:  +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && 
(1 = col2#33)))   :  +- Relation[col1#32,col2#33] parquet
!:     +- Relation[col1#32,col2#33] parquet                                     
                 +- Filter ((1 = col#34) && isnotnull(col#34))
!+- Filter ((1 = col#34) && isnotnull(col#34))                                  
                    +- Relation[col#34] parquet
!   +- Relation[col#34] parquet                                                 
                 
                

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 
===
 Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
                                       Join Inner, ((col1#32 = col#34) && 
(col2#33 = col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33))) && (col2#33 = col1#32))   :- Filter (((isnotnull(col1#32) && 
isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1))
 :  +- Relation[col1#32,col2#33] parquet                                        
                                       :  +- Relation[col1#32,col2#33] parquet
 +- Filter ((1 = col#34) && isnotnull(col#34))                                  
                                       +- Filter ((1 = col#34) && 
isnotnull(col#34))
    +- Relation[col#34] parquet                                                 
                                          +- Relation[col#34] parquet
                

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding ===
 Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
                           Join Inner, ((col1#32 = col#34) && (col2#33 = 
col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33))) && (1 = 1))   :- Filter (((isnotnull(col1#32) && 
isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true)
 :  +- Relation[col1#32,col2#33] parquet                                        
                           :  +- Relation[col1#32,col2#33] parquet
 +- Filter ((1 = col#34) && isnotnull(col#34))                                  
                           +- Filter ((1 = col#34) && isnotnull(col#34))
    +- Relation[col#34] parquet                                                 
                              +- Relation[col#34] parquet
                

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 
===
 Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
                        Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33))) && true)   :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) 
&& ((col1#32 = 1) && (1 = col2#33)))
 :  +- Relation[col1#32,col2#33] parquet                                        
                        :  +- Relation[col1#32,col2#33] parquet
 +- Filter ((1 = col#34) && isnotnull(col#34))                                  
                        +- Filter ((1 = col#34) && isnotnull(col#34))
    +- Relation[col#34] parquet                                                 
                           +- Relation[col#34] parquet
                

=== Applying Rule 
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
!Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))                         
              Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && 
(col2#33 = col#34)))
 :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1 
= col2#33)))   :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && 
((col1#32 = 1) && (1 = col2#33)))
 :  +- Relation[col1#32,col2#33] parquet                                        
              :  +- Relation[col1#32,col2#33] parquet
 +- Filter ((1 = col#34) && isnotnull(col#34))                                  
              +- Filter ((1 = col#34) && isnotnull(col#34))
    +- Relation[col#34] parquet                                                 
                 +- Relation[col#34] parquet
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to