[
https://issues.apache.org/jira/browse/SPARK-17439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-17439:
------------------------------------
Assignee: Apache Spark
> QuantilesSummaries returns the wrong result after compression
> -------------------------------------------------------------
>
> Key: SPARK-17439
> URL: https://issues.apache.org/jira/browse/SPARK-17439
> Project: Spark
> Issue Type: Bug
> Reporter: Tim Hunter
> Assignee: Apache Spark
>
> [~clockfly] found the following corner case that returns the wrong quantile
> (off by 1):
> {code}
> test("test QuantileSummaries compression") {
> var left = new QuantileSummaries(10000, 0.0001)
> System.out.println("LEFT RIGHT")
> System.out.println("====================")
> (0 to 10).foreach { index =>
> left = left.insert(index)
> left = left.compress()
> var right = new QuantileSummaries(10000, 0.0001)
> (0 to index).foreach(right.insert(_))
> right = right.compress()
> System.out.println(s"${left.query(0.5)} ${right.query(0.5)}")
> }
> }
> {code}
> The result is:
> {code}
> LEFT RIGHT
> ====================
> 0.0 0.0
> 0.0 1.0
> 0.0 1.0
> 0.0 1.0
> 1.0 2.0
> 1.0 2.0
> 2.0 3.0
> 2.0 3.0
> 3.0 4.0
> 3.0 4.0
> 4.0 5.0
> {code}
> The value of the "LEFT" column represents the output when using
> QuantileSummaries in Window function, the value on the "RIGHT" column
> represents the expected result. The different between "LEFT" and "RIGHT"
> column is that the "LEFT" column does intermediate compression on the storage
> of QuantileSummaries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]