[ 
https://issues.apache.org/jira/browse/SPARK-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531105#comment-15531105
 ] 

Josh Rosen commented on SPARK-17712:
------------------------------------

Intuitively, the only case where you can push a filter beneath an aggregate is 
when that filter is defined over the grouping columns / expressions, since in 
that case the filter is acting to exclude entire groups from the query (like a 
HAVING clause).

However, our implementation of this logic is wrong because it checks whether a 
filter condition's references are a subset of the grouping columns without 
handling the case where an expression references no columns / attributes (as in 
my {{false}} case (or any expression that the optimizer folds to false)).

> Incorrect result due to invalid pushdown of data-independent filter beneath 
> aggregate
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-17712
>                 URL: https://issues.apache.org/jira/browse/SPARK-17712
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0, 2.0.2
>            Reporter: Josh Rosen
>              Labels: correctness
>
> Let {{diamonds}} be a non-empty table. The following two queries should both 
> return no rows, but the first returns a single row:
> {code}
> SELECT
> 1
> FROM (
>     SELECT
>     count(*)
>     FROM diamonds
> ) t1
> WHERE
> false
> {code}
> {code}
> SELECT
> 1
> FROM (
>     SELECT
>     *
>     FROM diamonds
> ) t1
> WHERE
> false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to