[
https://issues.apache.org/jira/browse/MATH-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168038#comment-17168038
]
institute for information industry commented on MATH-1551:
----------------------------------------------------------
I just push it again.
It seems that the deviations are not caused by accumulated errors.
I infer that it's caused by the formula.
When we compute the index given p with non-weighted percentile,
let index = p * ( N - 1) + 1 , where N denotes the number of samples,
estimation = Xk + (X_\{k+1} - X_\{k}) * (index - k). (here k = floor(index))
But when we compute that with weighted percentile and set all weight to 1,
estimation = Xk + (X_\{k+1} - X_\{k}) * (p*(N-1) - (k-1)(N-1))/((N-1))
Though the term "(p*(N-1) - (k-1)(N-1))/((N-1))" is mathematically equal to
"(index - k)",
the former do multiplication and division which may cause deviation.
It's inevitable I think.
> Compute Percentile with Weighted Samples
> ----------------------------------------
>
> Key: MATH-1551
> URL: https://issues.apache.org/jira/browse/MATH-1551
> Project: Commons Math
> Issue Type: New Feature
> Affects Versions: 4.0
> Reporter: institute for information industry
> Priority: Major
> Labels: features, newbie
> Fix For: 4.0
>
> Attachments: Percentile.java
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> The class Percentile only support estimation on non-weighted samples.
> I've implement some function to estimate percentiles of weighted samples.
> here is the reference:
> [https://stats.stackexchange.com/questions/13169/defining-quantiles-over-a-weighted-sample]!https://mail.google.com/mail/u/0?ui=2&ik=6059a2f2a6&attid=0.1&permmsgid=msg-a:r-8896240918589631988&th=1737bdaafc3e228f&view=fimg&sz=s0-l75-ft&attbid=ANGjdJ_GwCQ-hqi0o7ZJTEqcl6JLYbFiR2Y1sqxPL8jDNzheraAkX0beEUtAM4BYI2v_5XGvzCPj2gTBQutHT9bF8hYh7MgdiGYZhn0HtDuVswuCSuwy8aJVsvy3EqI&disp=emb&realattid=ii_kcyrjgea0!
>
> When all weights are equal to each other, it works like estimation on
> non-weighted samples under R-7.
> I can't find formulas for other rules but at least, now it can evaluate
> percentile for weighted samples.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)