Tim Hunter created SPARK-17439:
----------------------------------
Summary: QuantilesSummaries returns the wrong result after
compression
Key: SPARK-17439
URL: https://issues.apache.org/jira/browse/SPARK-17439
Project: Spark
Issue Type: Bug
Reporter: Tim Hunter
[~clockfly] found the following corner case that returns the wrong quantile
(off by 1):
{code}
test("test QuantileSummaries compression") {
var left = new QuantileSummaries(10000, 0.0001)
System.out.println("LEFT RIGHT")
System.out.println("====================")
(0 to 10).foreach { index =>
left = left.insert(index)
left = left.compress()
var right = new QuantileSummaries(10000, 0.0001)
(0 to index).foreach(right.insert(_))
right = right.compress()
System.out.println(s"${left.query(0.5)} ${right.query(0.5)}")
}
}
{code}
The result is:
{code}
LEFT RIGHT
====================
0.0 0.0
0.0 1.0
0.0 1.0
0.0 1.0
1.0 2.0
1.0 2.0
2.0 3.0
2.0 3.0
3.0 4.0
3.0 4.0
4.0 5.0
{code}
The value of the "LEFT" column represents the output when using
QuantileSummaries in Window function, the value on the "RIGHT" column
represents the expected result. The different between "LEFT" and "RIGHT" column
is that the "LEFT" column does intermediate compression on the storage of
QuantileSummaries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]