[ 
https://issues.apache.org/jira/browse/MATH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033967#comment-14033967
 ] 

Gilles commented on MATH-1129:
------------------------------

The 
[Javadoc|http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/stat/descriptive/rank/Percentile.html]
 for {{Percentile}} does provide some warning about NaN within data:
{noformat}
To compute percentiles, the data must be at least partially ordered. Input 
arrays are copied and recursively partitioned using an ordering definition. The 
ordering used by Arrays.sort(double[]) is the one determined by 
Double.compareTo(Double). This ordering makes Double.NaN larger than any other 
value (including Double.POSITIVE_INFINITY). Therefore, for example, the median 
(50th percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

Since percentile estimation usually involves interpolation between array 
elements, arrays containing NaN or infinite values will often result in NaN or 
infinite values returned.
{noformat}
but the caveat does not appear in {{DescriptiveStatistics}}.

Even when no NaN is returned, the result varies with the position of the NaN 
value in the data array. :(
It looks like the sorting is wrong in the presence of NaN. See below.

bq. This also creates doubts that the other methods handle NaN values correctly.

I don't know whether the intention was that the result should always be 
considered undefined in the presence of NaN.

Local sort
Without NaN: 25th percentile -0.1773147094639404 75th percentile 
0.2748649403760461
With NaN: 25th percentile 0.24166759508327315 75th percentile 
-0.028075857595882995
With +inf: 25th percentile -0.15595963093172435 75th percentile 
0.37445697625436497

java.util.Arrays.sort (sorting the whole data array)
Without NaN: 25th percentile -0.1773147094639404 75th percentile 
0.2748649403760461
With NaN: 25th percentile -0.15595963093172435 75th percentile 
0.37445697625436497
With +inf: 25th percentile -0.15595963093172435 75th percentile 
0.37445697625436497

I've attempted to fix the local sort:
Without NaN: 25th percentile -0.1773147094639404 75th percentile 
0.2748649403760461
With NaN: 25th percentile -0.15595963093172435 75th percentile 
0.37445697625436497
With +inf: 25th percentile -0.15595963093172435 75th percentile 
0.37445697625436497

If nobody objects, I'll commit this modification, and further tests can be 
devised to ensure that it works correctly for other inputs.


> Percentile Computation errs
> ---------------------------
>
>                 Key: MATH-1129
>                 URL: https://issues.apache.org/jira/browse/MATH-1129
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.2
>         Environment: Java 1.8.0
>            Reporter: Carl Witt
>
> In the following test, the 75th percentile is _smaller_ than the 25th 
> percentile, leaving me with a negative interquartile range.
> {code:title=Bar.java|borderStyle=solid}
> @Test public void negativePercentiles(){
>         double[] data = new double[]{
>                 -0.012086732064244697, 
>                 -0.24975668704012527, 
>                 0.5706168483164684, 
>                 -0.322111769955327, 
>                 0.24166759508327315, 
>                 Double.NaN, 
>                 0.16698443218942854, 
>                 -0.10427763937565114, 
>                 -0.15595963093172435, 
>                 -0.028075857595882995, 
>                 -0.24137994506058857, 
>                 0.47543170476574426, 
>                 -0.07495595384947631, 
>                 0.37445697625436497, 
>                 -0.09944199541668033
>         };
>         DescriptiveStatistics descriptiveStatistics = new 
> DescriptiveStatistics(data);
>         double threeQuarters = descriptiveStatistics.getPercentile(75);
>         double oneQuarter = descriptiveStatistics.getPercentile(25);
>         double IQR = threeQuarters - oneQuarter;
>         
>         System.out.println(String.format("25th percentile %s 75th percentile 
> %s", oneQuarter, threeQuarters ));
>         
>         assert IQR >= 0;
>         
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to