Looks like my evenings this week (after today) will be open. I was thinking 
about coding up a potentially major overhaul of the axes.Axes.boxplot. Here's a 
rough outline of what I was thinking:

1) Improve the bootstrapping of the confidence intervals around the median
2) Add support for masked arrays (i.e., let user specify if masked values 
should be considered or not -- currently they are always considered, IIRC)
3) Improve the calculation of the percentiles to be consistent with SciPy and R.

#1 seems like something that'll be nice. #2 seems pretty essential to me. The 
third improvement is something for which I would want y'all's blessing before 
moving ahead. However, I think it's pretty critical. See (25th and 75th 
percentiles) below:

import numpy as np
import matplotlib.mlab as mlab
import scipy.stats as stats

def comparePercentiles(x):
     mlp = mlab.prctile(x)
     stp = np.array([])
     for p in (0.0, 25.0, 50.0, 75.0, 100.0):
         stp = np.hstack([stp, stats.scoreatpercentile(x,p)])
     outstring = """
     mlab \t scipy
     -------------
     %0.3f \t %0.3f (0th)
     %0.3f \t %0.3f (25th)
     %0.3f \t %0.3f (50th)
     %0.3f \t %0.3f (75th)
     %0.3f \t %0.3f (100th)
     """ % (mlp[0], stp[0], mlp[1], stp[1], mlp[2], stp[2], mlp[3], stp[3], 
mlp[4], stp[4])
     print(outstring)

>>> comparePercentiles(x)

    mlab         scipy
    ----------------------
    -1.245   -1.245 (0th)
    -0.950   -0.802 (25th)
    -0.162   -0.162 (50th)
    0.571    0.266 (75th)
    1.067    1.067 (100th)

Copying and pasting the exact same data into R I get:
> quantile(x, probs=c(0.0, 0.25, 0.50, 0.75, 1.0))
        0%        25%        50%        75%       100%
-1.2448508 -0.8022337 -0.1617812  0.2661112  1.0666244


Seems like it's clear that something needs to be done. AFAICT, scipy is not 
listed as a dependency of matplotlib, so it'll probably just be easier to 
retool mlab.prctile to return values that agree with scipy and R. What do you 
think? Would this be a welcome contribution?

Thanks,
-Paul Hobson


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to