[jira] [Commented] (MATH-1169) Faster Percentile

Thomas Neidhart (JIRA) Thu, 03 Dec 2015 13:59:42 -0800

    [ 
https://issues.apache.org/jira/browse/MATH-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038673#comment-15038673
 ]


Thomas Neidhart commented on MATH-1169:
---------------------------------------

The advantage of the current implementation is that the pivots are cached, thus 
sub-sequent calls will be faster. In your benchmark you select new values for 
each iteration, which renders the pivot cache useless.

We could try adding the FloydRivest selection as an extension of the current 
KthSelector implementation, and a user can select which selector to use via the 
withKthSelector method in Percentile.

> Faster Percentile
> -----------------
>
>                 Key: MATH-1169
>                 URL: https://issues.apache.org/jira/browse/MATH-1169
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Adam Stelmaszczyk
>            Priority: Minor
>         Attachments: FloydRivest.java, Main.java, PercentileFloydRivest.java
>
>
> Percentile class is now using Hoare selection algorithm (aka 
> [Quickselect|http://en.wikipedia.org/wiki/Quickselect]) with median of 3 for 
> choosing a pivot and caching pivots. Quickselect expected runtime is about 
> 3.4N + O(N).
> The constant can be improved to 1.5N by different pivot strategy involving 
> sampling, yielding the [Floyd–Rivest 
> algorithm|http://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm], which 
> has average complexity of 1.5N + O(N) for median, with other position 
> statistics even faster.
> For proof of concept I implemented Floyd–Rivest algorithm without caching 
> pivots following Wikipedia and benchmarked it with 
> [Caliper|https://code.google.com/p/caliper/].
> Array size - runtime for Floyd-Rivest - runtime for current Percentile - % 
> faster
> 100000 - 6.907 - 7.083 - 2.5% 
> [link|https://microbenchmarks.appspot.com/runs/a0d947ee-57fc-4636-b687-b4bc5170f1d7#r:scenario.benchmarkSpec.methodName]
> 1000000 - 70.3 - 75.4 - 6.8% 
> [link|https://microbenchmarks.appspot.com/runs/77291c2e-6dbb-4666-bf37-c174e4b53f2e#r:scenario.benchmarkSpec.methodName]
> 10000000 - 708 - 868 - 18.4% 
> [link|https://microbenchmarks.appspot.com/runs/c0f65e35-44c0-458e-b6e0-634b4e6fa68b#r:scenario.benchmarkSpec.methodName]
> In attachments:
> {{Main.java}} should be run to repeat experiments, it needs Caliper JAR: 
> https://code.google.com/p/caliper/downloads/detail?name=caliper-1.0-beta-1-all.jar
> Every call to evaluate() I'm filling the input double[] array with different 
> random numbers.
> {{FloydRivest.java}} contatins implementation of algorithm basing on 
> pseudocode from its Wikipedia page.
> {{PercentileFloydRivest.java}} is Percentile.java with modified evaluate(), 
> so it calls FloydRivest.select() instead of typical select(). pivotsHeap was 
> removed for simplicity (maybe it can improve efficiency even more).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MATH-1169) Faster Percentile

Reply via email to