[
https://issues.apache.org/jira/browse/SPARK-34882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tanel Kiis updated SPARK-34882:
-------------------------------
Description:
{code:title=group-by.sql}
SELECT
first(DISTINCT a), last(DISTINCT a),
first(a), last(a),
first(DISTINCT b), last(DISTINCT b),
first(b), last(b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;{code}
{code:title=group-by.sql.out}
-- !query schema
struct<first(DISTINCT a):int,last(DISTINCT
a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT
b):int,first(b):int,last(b):int>
-- !query output
NULL 1 1 3 1 NULL 1 2
{code}
The results should not be NULL, because NULL inputs are filtered out.
was:
{code:title=group-by.sql}
SELECT first(DISTINCT a), last(DISTINCT a), first(DISTINCT b), last(DISTINCT b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;
{code}
{code:title=group-by.sql.out}
-- !query
SELECT first(DISTINCT a), last(DISTINCT a), first(DISTINCT b), last(DISTINCT
b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL
-- !query schema
struct<first(DISTINCT a):int,last(DISTINCT a):int,first(DISTINCT
b):int,last(DISTINCT b):int>
-- !query output
1 3 NULL NULL
{code}
The results should not be NULL, because NULL inputs are filtered out.
> RewriteDistinctAggregates can cause a bug if the aggregator does not ignore
> NULLs
> ---------------------------------------------------------------------------------
>
> Key: SPARK-34882
> URL: https://issues.apache.org/jira/browse/SPARK-34882
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Tanel Kiis
> Priority: Major
>
> {code:title=group-by.sql}
> SELECT
> first(DISTINCT a), last(DISTINCT a),
> first(a), last(a),
> first(DISTINCT b), last(DISTINCT b),
> first(b), last(b)
> FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;{code}
> {code:title=group-by.sql.out}
> -- !query schema
> struct<first(DISTINCT a):int,last(DISTINCT
> a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT
> b):int,first(b):int,last(b):int>
> -- !query output
> NULL 1 1 3 1 NULL 1 2
> {code}
> The results should not be NULL, because NULL inputs are filtered out.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]