[ 
https://issues.apache.org/jira/browse/MATH-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975027#comment-13975027
 ] 

Ted Dunning commented on MATH-418:
----------------------------------

As a clarification, the median and the mean are never separated by more than a 
single standard deviation.  The failure of binapprox on skewed data occurs 
because essentially all of the data can be in the range around the mean.  A 
secondary failure is on sorted data.

For example, for Gamma(0.1, 0.1), the mean is 1, the sd is 3.15 and the median 
is 0.006.  About 93% of the distribution is in the range [mean-sd, mean+sd]

> add a storeless version of Percentile
> -------------------------------------
>
>                 Key: MATH-418
>                 URL: https://issues.apache.org/jira/browse/MATH-418
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 2.1
>            Reporter: Luc Maisonobe
>             Fix For: 4.0
>
>         Attachments: patch
>
>
> The Percentile class can handle only in-memory data.
> It would be interesting to use an on-line algorithm to estimate quantiles as 
> a storeless statistic.
> An example of such an algorithm is the exponentially weighted stochastic 
> approximation  described in a 2000 paper by Fei Chen ,  Diane Lambert  and 
> José C. Pinheiro "Incremental Quantile Estimation for Massive Tracking" which 
> can be retrieved from CiteSeerX at 
> [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.1580].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to