[ 
https://issues.apache.org/jira/browse/SPARK-34882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanel Kiis updated SPARK-34882:
-------------------------------
    Description: 
{code:title=group-by.sql}
SELECT
    first(DISTINCT a), last(DISTINCT a),
    first(a), last(a),
    first(DISTINCT b), last(DISTINCT b),
    first(b), last(b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;{code}
{code:title=group-by.sql.out}
-- !query schema
struct<first(DISTINCT a):int,last(DISTINCT 
a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT 
b):int,first(b):int,last(b):int>
-- !query output
NULL    1       1       3       1       NULL    1       2
{code}

The results should not be NULL, because NULL inputs are filtered out.

  was:
{code:title=group-by.sql}
SELECT first(DISTINCT a), last(DISTINCT a), first(DISTINCT b), last(DISTINCT b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;
{code}
{code:title=group-by.sql.out}
-- !query
SELECT first(DISTINCT  a), last(DISTINCT  a), first(DISTINCT  b), last(DISTINCT 
 b)
FROM testData WHERE a IS NOT NULL AND b IS NOT NULL
-- !query schema
struct<first(DISTINCT a):int,last(DISTINCT a):int,first(DISTINCT 
b):int,last(DISTINCT b):int>
-- !query output
1       3       NULL    NULL
{code}

The results should not be NULL, because NULL inputs are filtered out.


> RewriteDistinctAggregates can cause a bug if the aggregator does not ignore 
> NULLs
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-34882
>                 URL: https://issues.apache.org/jira/browse/SPARK-34882
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Tanel Kiis
>            Priority: Major
>
> {code:title=group-by.sql}
> SELECT
>     first(DISTINCT a), last(DISTINCT a),
>     first(a), last(a),
>     first(DISTINCT b), last(DISTINCT b),
>     first(b), last(b)
> FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;{code}
> {code:title=group-by.sql.out}
> -- !query schema
> struct<first(DISTINCT a):int,last(DISTINCT 
> a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT 
> b):int,first(b):int,last(b):int>
> -- !query output
> NULL  1       1       3       1       NULL    1       2
> {code}
> The results should not be NULL, because NULL inputs are filtered out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to