John Tukey has studied outliers extensively in his interactive data analysis. He computes a box plot by measuring the IQR, that's interquartile range, of the data set. He adds and subtracts a multiple of the IQR to the upper and lower quartiles of the box in the boxplot. Data values outside the "hinges" (in Tukey speak) are outliers.
The code below is from Donald R. McNeil's IDA, A Practical Primer. http://www.pixentral.com/show.php?picture=1Fnz2FOWX9nuYzndC9GbDbi2z1yz50 --- (B=) On Jan 9, 2012, at 7:49 PM, Roger Hui <[email protected]> wrote: > I wonder if there are well-known techniques in statistics for dealing with > the following problem. > > t > 11 10 10 10 10 11 10 10 10 10 9 11 10 11 10 10 11 10 11 10 11 10 10 > 11 10 11 10 10 10 11 10 74 11 11 14 11 11 10 12 11 15 14 12 11 > 11 11 11 11 10 12 11 11 11 10 11 11 11 10 11 11 10 11 161241 49 > 32 12 11 11 12 10 11 10 12 11 12 11 11 12 11 11 12 11 11 11 12 > 11 11 12 11 11 11 11 11 11 11 10 11 11 12 12 > > t is a set of samples from a noisy source which is supposed to give the > same integer answer. Obviously, 161241 is an "outlier", and it is likely > that 74, 49, or even 32 are outliers too. Are there standard techniques > for discarding outliers to clean up the data, before the application of > statistical tests such as the means test or large sample test? > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
