Attila Zsolt Piros created SPARK-22806:
------------------------------------------

             Summary: Window Aggregate functions: unexpected result at ordered 
partition
                 Key: SPARK-22806
                 URL: https://issues.apache.org/jira/browse/SPARK-22806
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Attila Zsolt Piros


I got different results for the aggregate function (even for sum and count) 
when the partition is ordered "Window.partitionBy(column).orderBy(column))" and 
when it is not ordered 'Window.partitionBy("column")".

Example:

test("count, sum, stddev_pop functions over window") {
    val df = Seq(
      ("a", 1, 100.0),
      ("b", 1, 200.0)).toDF("key", "partition", "value")
    df.createOrReplaceTempView("window_table")
    checkAnswer(
      df.select(
        $"key",
        count("value").over(Window.partitionBy("partition")),
        sum("value").over(Window.partitionBy("partition")),
        stddev_pop("value").over(Window.partitionBy("partition"))
      ),
      Seq(
        Row("a", 2, 300.0, 50.0),
        Row("b", 2, 300.0, 50.0)))
  }

  test("count, sum, stddev_pop functions over ordered by window") {
    val df = Seq(
      ("a", 1, 100.0),
      ("b", 1, 200.0)).toDF("key", "partition", "value")
    df.createOrReplaceTempView("window_table")
    checkAnswer(
      df.select(
        $"key",
        count("value").over(Window.partitionBy("partition").orderBy("key")),
        sum("value").over(Window.partitionBy("partition").orderBy("key")),
        stddev_pop("value").over(Window.partitionBy("partition").orderBy("key"))
      ),
      Seq(
        Row("a", 2, 300.0, 50.0),
        Row("b", 2, 300.0, 50.0)))
  }

The "count, sum, stddev_pop functions over ordered by window" fails with the 
error:
== Results ==
!== Correct Answer - 2 ==   == Spark Answer - 2 ==
!struct<>                   struct<key:string,count(value) OVER (PARTITION BY 
partition ORDER BY key ASC NULLS FIRST unspecifiedframe$()):bigint,sum(value) 
OVER (PARTITION BY partition ORDER BY key ASC NULLS FIRST 
unspecifiedframe$()):double,stddev_pop(value) OVER (PARTITION BY partition 
ORDER BY key ASC NULLS FIRST unspecifiedframe$()):double>
![a,2,300.0,50.0]           [a,1,100.0,0.0]
 [b,2,300.0,50.0]           [b,2,300.0,50.0]



 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to