[
https://issues.apache.org/jira/browse/SPARK-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531105#comment-15531105
]
Josh Rosen commented on SPARK-17712:
------------------------------------
Intuitively, the only case where you can push a filter beneath an aggregate is
when that filter is defined over the grouping columns / expressions, since in
that case the filter is acting to exclude entire groups from the query (like a
HAVING clause).
However, our implementation of this logic is wrong because it checks whether a
filter condition's references are a subset of the grouping columns without
handling the case where an expression references no columns / attributes (as in
my {{false}} case (or any expression that the optimizer folds to false)).
> Incorrect result due to invalid pushdown of data-independent filter beneath
> aggregate
> -------------------------------------------------------------------------------------
>
> Key: SPARK-17712
> URL: https://issues.apache.org/jira/browse/SPARK-17712
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 2.0.0, 2.0.2
> Reporter: Josh Rosen
> Labels: correctness
>
> Let {{diamonds}} be a non-empty table. The following two queries should both
> return no rows, but the first returns a single row:
> {code}
> SELECT
> 1
> FROM (
> SELECT
> count(*)
> FROM diamonds
> ) t1
> WHERE
> false
> {code}
> {code}
> SELECT
> 1
> FROM (
> SELECT
> *
> FROM diamonds
> ) t1
> WHERE
> false
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]