Apart from interquartile distance, you may want to consider the following:
1- eliminate/replace by average of remnant any data more than n-times standard deviation away from average. n=3 or 4 are good values unless you assume the laws of your observations to have very heavy tails (for a normal law, n is enough) 2- eliminate/replace the n maximal and n minimal values in your sample (when applies to average, this is dubbed "Windsor average"). -----Message d'origine----- De : [email protected] [mailto:[email protected]] De la part de km Envoyé : 10 January 2012 14:55 À : Chat forum Objet : Re: [Jchat] discarding outliers Also Moore and McCabe give explicit directions for finding Q1 and Q3: ------------- To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M In the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. -------------- The five-number summary, another Tukey invention, is Minimum Q1 M Q3 Maximum For the tiny data set 1 2 2 4 6 the five-number summary is 1 1.5 2 5 6 Kip Murray Sent from my iPad On Jan 9, 2012, at 11:27 PM, Brian Schott <[email protected]> wrote: > Yes, I agree with Kip. > > Some of my Tukey speak was incorrect. Tukey the thresholds of 1.5*IQR > "inner fences" and 3*IQR were "outer fences", and the violating data > points were "outside" or "outside" depending on how far out they were. > So the Tukey multipliers were 1.5 and 3.0 or if you use his term for > 1.5*IQR ("step"), then the multipliers are 1 and 2 steps. > > And btw, the lower and upper end of the box are the "hinges". I just > dug out my copy of Tukey's Exploratory Data Analysis, Addison-Wesley > 1977. > > On Tue, Jan 10, 2012 at 12:16 AM, km <[email protected]> wrote: >> From Moore and McCabe, Introduction to the Practice of Statistics (2003) p. 46 >> >> -------------- >> >> The interquartile range IQR is the distance between the first and third quartiles. >> IQR = Q3 - Q1 >> >> The 1.5 x IQR Criterion for Outliers >> Call an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. >> >> --------------- >> >> (You should informally investigate suspected outliers, looking for a reason to throw them out.) >> >> Kip Murray >> >> Sent from my iPad >> >> >> On Jan 9, 2012, at 6:49 PM, Roger Hui <[email protected]> wrote: >> >>> I wonder if there are well-known techniques in statistics for dealing with >>> the following problem. >>> >>> t >>> 11 10 10 10 10 11 10 10 10 10 9 11 10 11 10 10 11 10 11 10 11 10 10 >>> 11 10 11 10 10 10 11 10 74 11 11 14 11 11 10 12 11 15 14 12 11 >>> 11 11 11 11 10 12 11 11 11 10 11 11 11 10 11 11 10 11 161241 49 >>> 32 12 11 11 12 10 11 10 12 11 12 11 11 12 11 11 12 11 11 11 12 >>> 11 11 12 11 11 11 11 11 11 11 10 11 11 12 12 >>> >>> t is a set of samples from a noisy source which is supposed to give the >>> same integer answer. Obviously, 161241 is an "outlier", and it is likely >>> that 74, 49, or even 32 are outliers too. Are there standard techniques >>> for discarding outliers to clean up the data, before the application of >>> statistical tests such as the means test or large sample test? >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > > > -- > (B=) <-----my sig > Brian Schott > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
