[ 
https://issues.apache.org/jira/browse/SPARK-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531051#comment-15531051
 ] 

Josh Rosen commented on SPARK-17712:
------------------------------------

This appears to be an optimizer bug:

{code}
16/09/28 15:18:57 TRACE SparkOptimizer:
=== Applying Rule 
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases ===
 Project [1 AS 1#10]                                    Project [1 AS 1#10]
 +- Filter false                                        +- Filter false
!   +- SubqueryAlias t1                                    +- Aggregate 
[count(1) AS count(1)#9L]
!      +- Aggregate [count(1) AS count(1)#9L]                 +- Range (1, 10, 
step=1, splits=Some(8))
!         +- SubqueryAlias diamonds
!            +- Range (1, 10, step=1, splits=Some(8))

16/09/28 15:18:57 DEBUG SparkOptimizer:
=== Result of Batch Finish Analysis ===
 Project [1 AS 1#10]                                    Project [1 AS 1#10]
 +- Filter false                                        +- Filter false
!   +- SubqueryAlias t1                                    +- Aggregate 
[count(1) AS count(1)#9L]
!      +- Aggregate [count(1) AS count(1)#9L]                 +- Range (1, 10, 
step=1, splits=Some(8))
!         +- SubqueryAlias diamonds
!            +- Range (1, 10, step=1, splits=Some(8))

16/09/28 15:18:57 TRACE SparkOptimizer: Fixed point reached for batch Union 
after 1 iterations.
16/09/28 15:18:57 TRACE SparkOptimizer: Batch Union has no effect.
16/09/28 15:18:57 TRACE SparkOptimizer: Fixed point reached for batch Subquery 
after 1 iterations.
16/09/28 15:18:57 TRACE SparkOptimizer: Batch Subquery has no effect.
16/09/28 15:18:57 TRACE SparkOptimizer: Fixed point reached for batch Replace 
Operators after 1 iterations.
16/09/28 15:18:57 TRACE SparkOptimizer: Batch Replace Operators has no effect.
16/09/28 15:18:57 TRACE SparkOptimizer: Fixed point reached for batch Aggregate 
after 1 iterations.
16/09/28 15:18:57 TRACE SparkOptimizer: Batch Aggregate has no effect.
16/09/28 15:18:57 TRACE SparkOptimizer:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicate ===
 Project [1 AS 1#10]                              Project [1 AS 1#10]
!+- Filter false                                  +- Aggregate [count(1) AS 
count(1)#9L]
!   +- Aggregate [count(1) AS count(1)#9L]           +- Filter false
       +- Range (1, 10, step=1, splits=Some(8))         +- Range (1, 10, 
step=1, splits=Some(8))

16/09/28 15:18:57 TRACE SparkOptimizer:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ColumnPruning ===
 Project [1 AS 1#10]                              Project [1 AS 1#10]
!+- Aggregate [count(1) AS count(1)#9L]           +- Aggregate
!   +- Filter false                                  +- Project
!      +- Range (1, 10, step=1, splits=Some(8))         +- Filter false
!                                                          +- Range (1, 10, 
step=1, splits=Some(8))

16/09/28 15:18:57 TRACE SparkOptimizer:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.CollapseProject ===
!Project [1 AS 1#10]                                 Aggregate [1 AS 1#10]
!+- Aggregate                                        +- Project
!   +- Project                                          +- Filter false
!      +- Filter false                                     +- Range (1, 10, 
step=1, splits=Some(8))
!         +- Range (1, 10, step=1, splits=Some(8))

16/09/28 15:18:57 TRACE SparkOptimizer:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.PruneFilters ===
 Aggregate [1 AS 1#10]                            Aggregate [1 AS 1#10]
 +- Project                                       +- Project
!   +- Filter false                                  +- LocalRelation <empty>, 
[id#0L]
!      +- Range (1, 10, step=1, splits=Some(8))

16/09/28 15:18:57 TRACE SparkOptimizer: Fixed point reached for batch Operator 
Optimizations after 2 iterations.
{code}

It looks like the {{PushDownPredicate}} rule is pushing the filter beneath an 
aggregate, which is unsound.

> Incorrect result when selecting from aggregate subquery where outer WHERE 
> clause constant-folds to false
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17712
>                 URL: https://issues.apache.org/jira/browse/SPARK-17712
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0, 2.0.2
>            Reporter: Josh Rosen
>              Labels: correctness
>
> Let {{diamonds}} be a non-empty table. The following two queries should both 
> return no rows, but the first returns a single row:
> {code}
> SELECT
> 1
> FROM (
>     SELECT
>     count(*)
>     FROM diamonds
> ) t1
> WHERE
> false
> {code}
> {code}
> SELECT
> 1
> FROM (
>     SELECT
>     *
>     FROM diamonds
> ) t1
> WHERE
> false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to