Andrew Straw wrote: > Also, I think that formula is only for normally distributed data. Which, > especially if you're using boxplots, medians, and quartiles, may not be > a valid assumption. > > Maybe we should at least raise a warning when someone uses notch=1. The > current implementation seems dubious, at best, IMO. >
(I sent the previous version of this email a bit too early -- this is slightly edited for clarity.) I read the following reference: McGill, R., Tukey, J.W., and Larsen, W.A. (1978) "Variations of Boxplots", The American Statistician, 32:12-16. McGill et al. have an entire section devoted to "Choice of Notch Size", starting with: "In notched box plots, one is, of course, faced with the question of how best to determine the widths of the notches. Many methods, both classical and non-parametric, might be considered. None will likely be best in all cases." They then describe a suggestion based on the Gaussian-based asymptotic approximation (Kendall and Stuart, 1967). Here the standard deviation of the median is given by s = 1.25*R / (1.35 * sqrt(N)) where R is the interquartile range and N is the number of observations. Using this value for s, the notch around each median should be M +/- Cs where C is a constant. To summarize this section of their paper, values of C between 1.386 and 1.96 could be justified depending on the standard deviations, and they choose C=1.7 empirically as preferable and ultimately give the full equation for notches to be M +/- 1.7* (1.25*R / (1.35 * sqrt(N))) But they end the section with: "Clearly, a variety of other choices, such as a single less conservative value (<1.7) or one dependent upon the data (chosen to compromise over the range of the ratios of the spreads involved), are possible and may be preferable in certain cases." The thing not done in this article is to display outliers -- they refer the reader to "schematic plots" in Tukey's 1977 book titled Exploratory Data Analysis (Addison-Wesley). In the version of boxplots described in this paper, the whiskers go to the data extremes. ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel