[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333678#comment-16333678 ]
Sean Owen commented on SPARK-22208: ----------------------------------- It's a bug fix, and more of a corner case of behavior, so I don't know if it must be called out in the release notes as a behavior change, if that's what you mean. > Improve percentile_approx by not rounding up targetError and starting from > index 0 > ---------------------------------------------------------------------------------- > > Key: SPARK-22208 > URL: https://issues.apache.org/jira/browse/SPARK-22208 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Zhenhua Wang > Assignee: Zhenhua Wang > Priority: Major > Labels: releasenotes > Fix For: 2.3.0 > > > percentile_approx never returns the first element when percentile is in > (relativeError, 1/N], where relativeError default is 1/10000, and N is the > total number of elements. But ideally, percentiles in [0, 1/N] should all > return the first element as the answer. > For example, given input data 1 to 10, if a user queries 10% (or even less) > percentile, it should return 1, because the first value 1 already reaches > 10%. Currently it returns 2. > Based on the paper, targetError is not rounded up, and searching index should > start from 0 instead of 1. By following the paper, we should be able to fix > the cases mentioned above. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org