Anton Okolnychyi created SPARK-21652:
----------------------------------------
Summary: Optimizer cannot reach a fixed point on certain queries
Key: SPARK-21652
URL: https://issues.apache.org/jira/browse/SPARK-21652
Project: Spark
Issue Type: Bug
Components: Optimizer, SQL
Affects Versions: 2.2.0
Reporter: Anton Okolnychyi
The optimizer cannot reach a fixed point on the following query:
{code}
Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1")
Seq(1, 2).toDF("col").write.saveAsTable("t2")
spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 =
t2.col AND t1.col2 = t2.col").explain(true)
{code}
At some point during the optimization, InferFiltersFromConstraints infers a new
constraint '(col2#33 = col1#32)' that is appended to the join condition, then
PushPredicateThroughJoin pushes it down, ConstantPropagation replaces '(col2#33
= col1#32)' with '1 = 1' based on other propagated constraints, ConstantFolding
replaces '1 = 1' with 'true and BooleanSimplification finally removes this
predicate. However, InferFiltersFromConstraints will again infer '(col2#33 =
col1#32)' on the next iteration and the process will continue until the limit
of iterations is reached.
See below for more details
{noformat}
=== Applying Rule
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
!Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) &&
(col2#33 = col#34)))
:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
((col1#32 = 1) && (1 = col2#33)))
: +- Relation[col1#32,col2#33] parquet
: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Relation[col#34] parquet
+- Relation[col#34] parquet
=== Applying Rule
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
!Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 =
col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) :- Filter (col2#33 = col1#32)
!: +- Relation[col1#32,col2#33] parquet
: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
((col1#32 = 1) && (1 = col2#33)))
!+- Filter ((1 = col#34) && isnotnull(col#34))
: +- Relation[col1#32,col2#33] parquet
! +- Relation[col#34] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
!
+- Relation[col#34] parquet
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters ===
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter (col2#33 = col1#32)
:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) &&
((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32))
!: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
(1 = col2#33))) : +- Relation[col1#32,col2#33] parquet
!: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
!+- Filter ((1 = col#34) && isnotnull(col#34))
+- Relation[col#34] parquet
! +- Relation[col#34] parquet
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
===
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col1#32 = col#34) &&
(col2#33 = col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) && (col2#33 = col1#32)) :- Filter (((isnotnull(col1#32) &&
isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1))
: +- Relation[col1#32,col2#33] parquet
: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Filter ((1 = col#34) &&
isnotnull(col#34))
+- Relation[col#34] parquet
+- Relation[col#34] parquet
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding ===
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col1#32 = col#34) && (col2#33 =
col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) && (1 = 1)) :- Filter (((isnotnull(col1#32) &&
isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true)
: +- Relation[col1#32,col2#33] parquet
: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Relation[col#34] parquet
+- Relation[col#34] parquet
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
===
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
!:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) && true) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33))
&& ((col1#32 = 1) && (1 = col2#33)))
: +- Relation[col1#32,col2#33] parquet
: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Relation[col#34] parquet
+- Relation[col#34] parquet
=== Applying Rule
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
!Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) &&
(col2#33 = col#34)))
:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && (1
= col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
((col1#32 = 1) && (1 = col2#33)))
: +- Relation[col1#32,col2#33] parquet
: +- Relation[col1#32,col2#33] parquet
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Filter ((1 = col#34) && isnotnull(col#34))
+- Relation[col#34] parquet
+- Relation[col#34] parquet
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]