[
https://issues.apache.org/jira/browse/SPARK-22806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Zsolt Piros updated SPARK-22806:
---------------------------------------
Description:
I got different results for aggregate functions (even for sum and count) when
the partition is ordered "Window.partitionBy(column).orderBy(column))" and when
it is not ordered 'Window.partitionBy("column")".
Example:
test("count, sum, stddev_pop functions over window") {
val df = Seq(
("a", 1, 100.0),
("b", 1, 200.0)).toDF("key", "partition", "value")
df.createOrReplaceTempView("window_table")
checkAnswer(
df.select(
$"key",
count("value").over(Window.partitionBy("partition")),
sum("value").over(Window.partitionBy("partition")),
stddev_pop("value").over(Window.partitionBy("partition"))
),
Seq(
Row("a", 2, 300.0, 50.0),
Row("b", 2, 300.0, 50.0)))
}
test("count, sum, stddev_pop functions over ordered by window") {
val df = Seq(
("a", 1, 100.0),
("b", 1, 200.0)).toDF("key", "partition", "value")
df.createOrReplaceTempView("window_table")
checkAnswer(
df.select(
$"key",
count("value").over(Window.partitionBy("partition").orderBy("key")),
sum("value").over(Window.partitionBy("partition").orderBy("key")),
stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
),
Seq(
Row("a", 2, 300.0, 50.0),
Row("b", 2, 300.0, 50.0)))
}
The "count, sum, stddev_pop functions over ordered by window" fails with the
error:
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<key:string,count(value) OVER (PARTITION BY
partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value)
OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST
unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition
ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
![a,2,300.0,50.0] [a,1,100.0,0.0]
[b,2,300.0,50.0] [b,2,300.0,50.0]
was:
I got different results for the aggregate function (even for sum and count)
when the partition is ordered "Window.partitionBy(column).orderBy(column))" and
when it is not ordered 'Window.partitionBy("column")".
Example:
test("count, sum, stddev_pop functions over window") {
val df = Seq(
("a", 1, 100.0),
("b", 1, 200.0)).toDF("key", "partition", "value")
df.createOrReplaceTempView("window_table")
checkAnswer(
df.select(
$"key",
count("value").over(Window.partitionBy("partition")),
sum("value").over(Window.partitionBy("partition")),
stddev_pop("value").over(Window.partitionBy("partition"))
),
Seq(
Row("a", 2, 300.0, 50.0),
Row("b", 2, 300.0, 50.0)))
}
test("count, sum, stddev_pop functions over ordered by window") {
val df = Seq(
("a", 1, 100.0),
("b", 1, 200.0)).toDF("key", "partition", "value")
df.createOrReplaceTempView("window_table")
checkAnswer(
df.select(
$"key",
count("value").over(Window.partitionBy("partition").orderBy("key")),
sum("value").over(Window.partitionBy("partition").orderBy("key")),
stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
),
Seq(
Row("a", 2, 300.0, 50.0),
Row("b", 2, 300.0, 50.0)))
}
The "count, sum, stddev_pop functions over ordered by window" fails with the
error:
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<key:string,count(value) OVER (PARTITION BY
partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value)
OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST
unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition
ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
![a,2,300.0,50.0] [a,1,100.0,0.0]
[b,2,300.0,50.0] [b,2,300.0,50.0]
> Window Aggregate functions: unexpected result at ordered partition
> ------------------------------------------------------------------
>
> Key: SPARK-22806
> URL: https://issues.apache.org/jira/browse/SPARK-22806
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Attila Zsolt Piros
>
> I got different results for aggregate functions (even for sum and count) when
> the partition is ordered "Window.partitionBy(column).orderBy(column))" and
> when it is not ordered 'Window.partitionBy("column")".
> Example:
> test("count, sum, stddev_pop functions over window") {
> val df = Seq(
> ("a", 1, 100.0),
> ("b", 1, 200.0)).toDF("key", "partition", "value")
> df.createOrReplaceTempView("window_table")
> checkAnswer(
> df.select(
> $"key",
> count("value").over(Window.partitionBy("partition")),
> sum("value").over(Window.partitionBy("partition")),
> stddev_pop("value").over(Window.partitionBy("partition"))
> ),
> Seq(
> Row("a", 2, 300.0, 50.0),
> Row("b", 2, 300.0, 50.0)))
> }
> test("count, sum, stddev_pop functions over ordered by window") {
> val df = Seq(
> ("a", 1, 100.0),
> ("b", 1, 200.0)).toDF("key", "partition", "value")
> df.createOrReplaceTempView("window_table")
> checkAnswer(
> df.select(
> $"key",
> count("value").over(Window.partitionBy("partition").orderBy("key")),
> sum("value").over(Window.partitionBy("partition").orderBy("key")),
>
> stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
> ),
> Seq(
> Row("a", 2, 300.0, 50.0),
> Row("b", 2, 300.0, 50.0)))
> }
> The "count, sum, stddev_pop functions over ordered by window" fails with the
> error:
> == Results ==
> !== Correct Answer - 2 == == Spark Answer - 2 ==
> !struct<> struct<key:string,count(value) OVER (PARTITION BY
> partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value)
> OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST
> unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition
> ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
> ![a,2,300.0,50.0] [a,1,100.0,0.0]
> [b,2,300.0,50.0] [b,2,300.0,50.0]
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]