Tim Hunter created SPARK-17439:
----------------------------------

             Summary: QuantilesSummaries returns the wrong result after 
compression
                 Key: SPARK-17439
                 URL: https://issues.apache.org/jira/browse/SPARK-17439
             Project: Spark
          Issue Type: Bug
            Reporter: Tim Hunter


[~clockfly] found the following corner case that returns the wrong quantile 
(off by 1):

{code}
test("test QuantileSummaries compression") {
    var left = new QuantileSummaries(10000, 0.0001)
    System.out.println("LEFT      RIGHT")
    System.out.println("====================")
    (0 to 10).foreach { index =>
      left = left.insert(index)
      left = left.compress()

      var right = new QuantileSummaries(10000, 0.0001)
      (0 to index).foreach(right.insert(_))
      right = right.compress()
      System.out.println(s"${left.query(0.5)}   ${right.query(0.5)}")
    }
  }
{code}

The result is:
{code}
LEFT      RIGHT
====================
0.0   0.0
0.0   1.0
0.0   1.0
0.0   1.0
1.0   2.0
1.0   2.0
2.0   3.0
2.0   3.0
3.0   4.0
3.0   4.0
4.0   5.0
{code}


The value of the "LEFT" column represents the output when using 
QuantileSummaries in Window function, the value on the "RIGHT" column 
represents the expected result. The different between "LEFT" and "RIGHT" column 
is that the "LEFT" column does intermediate compression on the storage of 
QuantileSummaries.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to