On Tue, Dec 15, 2009 at 9:57 AM, Andrew Straw <straw...@astraw.com> wrote:
>
>   notch_max = med + 1.57*iq/np.sqrt(row)
>   notch_min = med - 1.57*iq/np.sqrt(row)
>
> Is this code actually calculating a meaningful value? If so, what?
>

>From the statistics ignoramus in the room, so take this with a grain
of salt...  I'd write that code as

notch_max = med + (iq/2) * (pi/np.sqrt(row))

and it makes more sense.  The notch limits are an estimate of the
interval of the median, which is (one-half, for each up/down) the
q3-q1 range times a normalization factor which is pi/sqrt(n), where
n==row=len(d).  The 1/sqrt(n) makes some sense, as it's the usual
statistical error normalization factor.  The multiplication by pi, I'm
not so sure, and I can't find that exact formula in any quick stats
reference, but I'm sure someone who actually knows stats can point out
where it comes from.

Note that the code below does:

                if notch_max > q3:
                    notch_max = q3
                if notch_min < q1:
                    notch_min = q1

though matlab explicitly states in:

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/boxplot.html

that

"""
Interval endpoints are the extremes of the notches or the centers of
the triangular markers. When the sample size is small, notches may
extend beyond the end of the box.
"""

So it seems to me that the more principled thing to do would be to
leave those notch markers outside the box if they land there, because
that's a warning of the robustness of the estimation. Clipping them to
q1/q3 is effectively hiding a problem...


cheers,

f

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to