[
https://issues.apache.org/jira/browse/SPARK-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-17120:
------------------------------------
Assignee: Apache Spark
> Analyzer incorrectly optimizes plan to empty LocalRelation
> ----------------------------------------------------------
>
> Key: SPARK-17120
> URL: https://issues.apache.org/jira/browse/SPARK-17120
> Project: Spark
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: Josh Rosen
> Assignee: Apache Spark
> Priority: Blocker
>
> Consider the following query:
> {code}
> sc.parallelize(Seq(97)).toDF("int_col_6").createOrReplaceTempView("table_3")
> sc.parallelize(Seq(0)).toDF("int_col_1").createOrReplaceTempView("table_4")
> println(sql("""
> SELECT
> *
> FROM (
> SELECT
> COALESCE(t2.int_col_1, t1.int_col_6) AS int_col
> FROM table_3 t1
> LEFT JOIN table_4 t2 ON false
> ) t where (t.int_col) is not null
> """).collect().toSeq)
> {code}
> In the innermost query, the LEFT JOIN's condition is {{false}} but
> nevertheless the number of rows produced should equal the number of rows in
> {{table_3}} (which is non-empty). Since no values are {{null}}, the outer
> {{where}} should retain all rows, so the overall result of this query should
> contain a single row with the value '97'.
> Instead, the current Spark master (as of
> 12a89e55cbd630fa2986da984e066cd07d3bf1f7 at least) returns no rows. Looking
> at {{explain}}, it appears that the logical plan is optimizing to
> {{LocalRelation <empty>}}, so Spark doesn't even run the query. My suspicion
> is that there's a bug in constraint propagation or filter pushdown.
> This issue doesn't seem to affect Spark 2.0, so I think it's a regression in
> master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]