[
https://issues.apache.org/jira/browse/SPARK-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-21652:
------------------------------------
Assignee: (was: Apache Spark)
> Optimizer cannot reach a fixed point on certain queries
> -------------------------------------------------------
>
> Key: SPARK-21652
> URL: https://issues.apache.org/jira/browse/SPARK-21652
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 2.2.0
> Reporter: Anton Okolnychyi
>
> The optimizer cannot reach a fixed point on the following query:
> {code}
> Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1")
> Seq(1, 2).toDF("col").write.saveAsTable("t2")
> spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1
> = t2.col AND t1.col2 = t2.col").explain(true)
> {code}
> At some point during the optimization, InferFiltersFromConstraints infers a
> new constraint '(col2#33 = col1#32)' that is appended to the join condition,
> then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces
> '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints,
> ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally
> removes this predicate. However, InferFiltersFromConstraints will again infer
> '(col2#33 = col1#32)' on the next iteration and the process will continue
> until the limit of iterations is reached.
> See below for more details
> {noformat}
> === Applying Rule
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
> !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) &&
> (col2#33 = col#34)))
> :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
> ((col1#32 = 1) && (1 = col2#33)))
> : +- Relation[col1#32,col2#33] parquet
> : +- Relation[col1#32,col2#33] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Relation[col#34] parquet
> +- Relation[col#34] parquet
>
> === Applying Rule
> org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
> !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 =
> col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) :- Filter (col2#33 = col1#32)
> !: +- Relation[col1#32,col2#33] parquet
> : +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
> ((col1#32 = 1) && (1 = col2#33)))
> !+- Filter ((1 = col#34) && isnotnull(col#34))
> : +- Relation[col1#32,col2#33] parquet
> ! +- Relation[col#34] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> !
> +- Relation[col#34] parquet
>
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters ===
> Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> !:- Filter (col2#33 = col1#32)
> :- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) &&
> ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32))
> !: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1)
> && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet
> !: +- Relation[col1#32,col2#33] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> !+- Filter ((1 = col#34) && isnotnull(col#34))
> +- Relation[col#34] parquet
> ! +- Relation[col#34] parquet
>
>
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
> ===
> Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col1#32 = col#34) &&
> (col2#33 = col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) && (col2#33 = col1#32)) :- Filter (((isnotnull(col1#32) &&
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && (1 = 1))
> : +- Relation[col1#32,col2#33] parquet
> : +- Relation[col1#32,col2#33]
> parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Filter ((1 = col#34) &&
> isnotnull(col#34))
> +- Relation[col#34] parquet
> +- Relation[col#34] parquet
>
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding ===
> Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col1#32 = col#34) && (col2#33 =
> col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) && (1 = 1)) :- Filter (((isnotnull(col1#32) &&
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33))) && true)
> : +- Relation[col1#32,col2#33] parquet
> : +- Relation[col1#32,col2#33] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Relation[col#34] parquet
> +- Relation[col#34] parquet
>
> === Applying Rule
> org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
> Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col1#32 = col#34) && (col2#33 =
> col#34))
> !:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) && true) :- Filter ((isnotnull(col1#32) &&
> isnotnull(col2#33)) && ((col1#32 = 1) && (1 = col2#33)))
> : +- Relation[col1#32,col2#33] parquet
> : +- Relation[col1#32,col2#33] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Relation[col#34] parquet
> +- Relation[col#34] parquet
>
> === Applying Rule
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints ===
> !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34))
> Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) &&
> (col2#33 = col#34)))
> :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) &&
> (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) &&
> ((col1#32 = 1) && (1 = col2#33)))
> : +- Relation[col1#32,col2#33] parquet
> : +- Relation[col1#32,col2#33] parquet
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Filter ((1 = col#34) && isnotnull(col#34))
> +- Relation[col#34] parquet
> +- Relation[col#34] parquet
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]