re: [R] boxplot notches

Michael Friendly Tue, 02 Mar 2004 06:27:27 -0800

I think John Tukey's idea was that this formula (or just the fact of
using median and quartiles) is still often approximately correct
for quite a few kinds of moderate contaminations...
It may be approximately correct for the width of a CI (and when I checked it was only appproximately correct for a normal), but I would seriously doubt if it were approximately correct for a significance level of 5%. Remember how fast the tails of the asymptotic normal distribution decay: a 20% error turns 5% into 2%.
BTW, if there is a precise reference for this it would be good to add it
to boxplot.stats.Rd, as the confidence limits are unexplained there.

The factor 1.58 for H-spr/\sqrt{n} comes from the product of three approximations going from a 95% confidence interval for a difference in means, to one for a difference in medians, using the H-spr=IQR instead of the standard deviation:

H-spr/1.349 \approx \sigma in a N(0,1) dist/n \sqrt{ \pi / 2} \approx std error of a median 1.7 / sqrt{n} is the average of 1.96 and 1.39=1.96/\sqrt{2}, factors for the standard error of the difference between two means, in the cases where one variance is tiny, and where both are equal.

I believe this is explained in

@Article{McGill-etal:78,
 author =       "R. McGill and J. W. Tukey and W. Larsen",
 year =         "1978",
 title =        "Variations of Box Plots",
 journal =      TAS,
 volume =       "32",
 pages =        "12--16",
}

-- Michael Friendly Email: [EMAIL PROTECTED] Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

re: [R] boxplot notches

Reply via email to