Andrew Straw wrote:
> Also, I think that formula is only for normally distributed data. Which, 
> especially if you're using boxplots, medians, and quartiles, may not be 
> a valid assumption.
>
> Maybe we should at least raise a warning when someone uses notch=1. The 
> current implementation seems dubious, at best, IMO.
>   

(I sent the previous version of this email a bit too early -- this is
slightly edited for clarity.)

I read the following reference:

McGill, R., Tukey, J.W., and Larsen, W.A. (1978) "Variations of
Boxplots", The American Statistician, 32:12-16.

McGill et al. have an entire section devoted to "Choice of Notch Size",
starting with:

"In notched box plots, one is, of course, faced with the question of how
best to determine the widths of the notches. Many methods, both
classical and non-parametric, might be considered. None will likely be
best in all cases."

They then describe a suggestion based on the Gaussian-based asymptotic
approximation (Kendall and Stuart, 1967). Here the standard deviation of
the median is given by

s = 1.25*R / (1.35 * sqrt(N))

where R is the interquartile range and N is the number of observations.

Using this value for s, the notch around each median should be M +/- Cs
where C is a constant. To summarize this section of their paper, values
of C  between 1.386 and 1.96 could be justified depending on the
standard deviations, and they choose C=1.7 empirically as preferable and
ultimately give the full equation for notches to be

M +/-  1.7* (1.25*R / (1.35 * sqrt(N)))

But they end the section with:

"Clearly, a variety of other choices, such as a single less conservative
value (<1.7) or one dependent upon the data (chosen to compromise over
the range of the ratios of the spreads involved), are possible and may
be preferable in certain cases."

The thing not done in this article is to display outliers -- they refer
the reader to "schematic plots" in Tukey's 1977 book titled Exploratory
Data Analysis (Addison-Wesley). In the version of boxplots described in
this paper, the whiskers go to the data extremes.


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Reply via email to