Re: [Jchat] discarding outliers

Roger Hui Tue, 10 Jan 2012 08:59:59 -0800

Thanks to you and all other respondents for their helpful replies.

Do Moore & McCabe offer any guidance on how to compute the medians?
Wikipedia says "there is no universal agreement on choosing the quartile
values".  As well, in computing the IQR I have seen methods that make sense
to me but are quite tricky depending on whether #x is odd or even.


e.g Suppose x is 1 2 3 4 5, 6 7 8 9 10.  The descriptions I have seen say
that the median is 5.5.  When you then compute the median of the lower half
(q1), you exclude the 5.5, and report that q1 is 3, and likewise q3 is 8.
In contrast, if y is 1 2 3 4 5, 6, 7 8 9 10 11, the median is 6, but when
you compute q1 you *include* the 6 with the lower half, and report that q1
is 3.5, likewise you include 6 with the upper half so that q3 is 8.5.



On Tue, Jan 10, 2012 at 5:54 AM, km <[email protected]> wrote:

> Also Moore and McCabe give explicit directions for finding Q1 and Q3:
>
> -------------
> To calculate the quartiles:
>
> 1. Arrange the observations in increasing order and locate the median M In
> the ordered list of observations.
>
> 2. The first quartile Q1 is the median of the observations whose position
> in the ordered list is to the left of the location of the overall median.
>
> 3. The third quartile Q3 is the median of the observations whose position
> in the ordered list is to the right of the location of the overall median.
> --------------
>
> The five-number summary, another Tukey invention, is
>
> Minimum Q1 M Q3 Maximum
>
> For the tiny data set 1 2 2 4 6 the five-number summary is
>
> 1 1.5 2 5 6
>
>
> Kip Murray
>
> Sent from my iPad
>
>
> On Jan 9, 2012, at 11:27 PM, Brian Schott <[email protected]> wrote:
>
> > Yes, I agree with Kip.
> >
> > Some of my Tukey speak was incorrect. Tukey the thresholds of 1.5*IQR
> > "inner fences" and 3*IQR were "outer fences", and the violating data
> > points were "outside" or "outside" depending on how far out they were.
> > So the Tukey multipliers were 1.5 and 3.0 or if you use his term for
> > 1.5*IQR ("step"), then the multipliers are 1 and 2 steps.
> >
> > And btw, the lower and upper end of the box are the "hinges". I just
> > dug out my copy of Tukey's Exploratory Data Analysis, Addison-Wesley
> > 1977.
> >
> > On Tue, Jan 10, 2012 at 12:16 AM, km <[email protected]> wrote:
> >> From Moore and McCabe, Introduction to the Practice of Statistics
> (2003) p. 46
> >>
> >> --------------
> >>
> >> The interquartile range IQR is the distance between the first and third
> quartiles.
> >> IQR = Q3 - Q1
> >>
> >> The 1.5 x IQR Criterion for Outliers
> >> Call an observation a suspected outlier if it falls more than 1.5 x IQR
> above the third quartile or below the first quartile.
> >>
> >> ---------------
> >>
> >> (You should informally investigate suspected outliers, looking for a
> reason to throw them out.)
> >>
> >> Kip Murray
> >>
> >> Sent from my iPad
> >>
> >>
> >> On Jan 9, 2012, at 6:49 PM, Roger Hui <[email protected]>
> wrote:
> >>
> >>> I wonder if there are well-known techniques in statistics for dealing
> with
> >>> the following problem.
> >>>
> >>>      t
> >>> 11 10 10 10 10 11 10 10 10 10 9 11 10 11 10 10 11 10 11 10 11 10 10
> >>>      11 10 11 10 10 10 11 10 74 11 11 14 11 11 10 12 11 15 14 12 11
> >>>      11 11 11 11 10 12 11 11 11 10 11 11 11 10 11 11 10 11 161241 49
> >>>      32 12 11 11 12 10 11 10 12 11 12 11 11 12 11 11 12 11 11 11 12
> >>>      11 11 12 11 11 11 11 11 11 11 10 11 11 12 12
> >>>
> >>> t is a set of samples from a noisy source which is supposed to give the
> >>> same integer answer.  Obviously, 161241 is an "outlier", and it is
> likely
> >>> that 74, 49, or even 32 are outliers too.  Are there standard
> techniques
> >>> for discarding outliers to clean up the data, before the application of
> >>> statistical tests such as the means test or large sample test?
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jchat] discarding outliers

Reply via email to