[ 
https://issues.apache.org/jira/browse/SPARK-29699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021404#comment-17021404
 ] 

Thomas Graves commented on SPARK-29699:
---------------------------------------

this seems to be in the context of feature parity with postgres, we aren't 
doing that now and you say we do the same as mysql, if that is the case I would 
argue this doesn't seem like a correctness issue but a compatibility issue.  
thoughts?

> Different answers in nested aggregates with window functions
> ------------------------------------------------------------
>
>                 Key: SPARK-29699
>                 URL: https://issues.apache.org/jira/browse/SPARK-29699
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Takeshi Yamamuro
>            Priority: Major
>              Labels: correctness
>
> A nested aggregate below with a window function seems to have different 
> answers in the `rsum` column  between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest2 (a integer, b integer, c integer, d integer, 
> e integer, f integer, g integer, h integer);
> postgres=# insert into gstest2 values
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 1),
> postgres-#   (1, 1, 1, 1, 1, 1, 1, 2),
> postgres-#   (1, 1, 1, 1, 1, 1, 2, 2),
> postgres-#   (1, 1, 1, 1, 1, 2, 2, 2),
> postgres-#   (1, 1, 1, 1, 2, 2, 2, 2),
> postgres-#   (1, 1, 1, 2, 2, 2, 2, 2),
> postgres-#   (1, 1, 2, 2, 2, 2, 2, 2),
> postgres-#   (1, 2, 2, 2, 2, 2, 2, 2),
> postgres-#   (2, 2, 2, 2, 2, 2, 2, 2);
> INSERT 0 9
> postgres=# 
> postgres=# select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
> postgres-#   from gstest2 group by rollup (a,b) order by rsum, a, b;
>  a | b | sum | rsum 
> ---+---+-----+------
>  1 | 1 |   8 |    8
>  1 | 2 |   2 |   10
>  1 |   |  10 |   20
>  2 | 2 |   2 |   22
>  2 |   |   2 |   24
>    |   |  12 |   36
> (6 rows)
> {code}
> {code:java}
> scala> sql("""
>      | select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
>      |   from gstest2 group by rollup (a,b) order by rsum, a, b
>      | """).show()
> +----+----+------+----+                                                       
>   
> |   a|   b|sum(c)|rsum|
> +----+----+------+----+
> |null|null|    12|  12|
> |   1|null|    10|  22|
> |   1|   1|     8|  30|
> |   1|   2|     2|  32|
> |   2|null|     2|  34|
> |   2|   2|     2|  36|
> +----+----+------+----+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to