I think there is ambiguity because of the implied verbs you use to partition the vector with the median value. You might call 5.5 the median in the even case because that allows partitioning using only "<"; however, you could call 5 the median if you use "vals<:median". Also, 5.5 is your best guess if your sequence is a sample.
The quartile case is more complicated because you specify both lower and upper bounds and their values depend on the implicit comparison verbs you're using, e.g. "min < vals <: max" to partition without overlaps (versus "min <: vals < max"). On Tue, Jan 10, 2012 at 11:58 AM, Roger Hui <[email protected]>wrote: > Thanks to you and all other respondents for their helpful replies. > > Do Moore & McCabe offer any guidance on how to compute the medians? > Wikipedia says "there is no universal agreement on choosing the quartile > values". As well, in computing the IQR I have seen methods that make sense > to me but are quite tricky depending on whether #x is odd or even. > > e.g Suppose x is 1 2 3 4 5, 6 7 8 9 10. The descriptions I have seen say > that the median is 5.5. When you then compute the median of the lower half > (q1), you exclude the 5.5, and report that q1 is 3, and likewise q3 is 8. > In contrast, if y is 1 2 3 4 5, 6, 7 8 9 10 11, the median is 6, but when > you compute q1 you *include* the 6 with the lower half, and report that q1 > is 3.5, likewise you include 6 with the upper half so that q3 is 8.5. > > > > On Tue, Jan 10, 2012 at 5:54 AM, km <[email protected]> wrote: > > > Also Moore and McCabe give explicit directions for finding Q1 and Q3: > > > > ------------- > > To calculate the quartiles: > > > > 1. Arrange the observations in increasing order and locate the median M > In > > the ordered list of observations. > > > > 2. The first quartile Q1 is the median of the observations whose position > > in the ordered list is to the left of the location of the overall median. > > > > 3. The third quartile Q3 is the median of the observations whose position > > in the ordered list is to the right of the location of the overall > median. > > -------------- > > > > The five-number summary, another Tukey invention, is > > > > Minimum Q1 M Q3 Maximum > > > > For the tiny data set 1 2 2 4 6 the five-number summary is > > > > 1 1.5 2 5 6 > > > > > > Kip Murray > > > > Sent from my iPad > > > > > > On Jan 9, 2012, at 11:27 PM, Brian Schott <[email protected]> > wrote: > > > > > Yes, I agree with Kip. > > > > > > Some of my Tukey speak was incorrect. Tukey the thresholds of 1.5*IQR > > > "inner fences" and 3*IQR were "outer fences", and the violating data > > > points were "outside" or "outside" depending on how far out they were. > > > So the Tukey multipliers were 1.5 and 3.0 or if you use his term for > > > 1.5*IQR ("step"), then the multipliers are 1 and 2 steps. > > > > > > And btw, the lower and upper end of the box are the "hinges". I just > > > dug out my copy of Tukey's Exploratory Data Analysis, Addison-Wesley > > > 1977. > > > > > > On Tue, Jan 10, 2012 at 12:16 AM, km <[email protected]> wrote: > > >> From Moore and McCabe, Introduction to the Practice of Statistics > > (2003) p. 46 > > >> > > >> -------------- > > >> > > >> The interquartile range IQR is the distance between the first and > third > > quartiles. > > >> IQR = Q3 - Q1 > > >> > > >> The 1.5 x IQR Criterion for Outliers > > >> Call an observation a suspected outlier if it falls more than 1.5 x > IQR > > above the third quartile or below the first quartile. > > >> > > >> --------------- > > >> > > >> (You should informally investigate suspected outliers, looking for a > > reason to throw them out.) > > >> > > >> Kip Murray > > >> > > >> Sent from my iPad > > >> > > >> > > >> On Jan 9, 2012, at 6:49 PM, Roger Hui <[email protected]> > > wrote: > > >> > > >>> I wonder if there are well-known techniques in statistics for dealing > > with > > >>> the following problem. > > >>> > > >>> t > > >>> 11 10 10 10 10 11 10 10 10 10 9 11 10 11 10 10 11 10 11 10 11 10 10 > > >>> 11 10 11 10 10 10 11 10 74 11 11 14 11 11 10 12 11 15 14 12 11 > > >>> 11 11 11 11 10 12 11 11 11 10 11 11 11 10 11 11 10 11 161241 49 > > >>> 32 12 11 11 12 10 11 10 12 11 12 11 11 12 11 11 12 11 11 11 12 > > >>> 11 11 12 11 11 11 11 11 11 11 10 11 11 12 12 > > >>> > > >>> t is a set of samples from a noisy source which is supposed to give > the > > >>> same integer answer. Obviously, 161241 is an "outlier", and it is > > likely > > >>> that 74, 49, or even 32 are outliers too. Are there standard > > techniques > > >>> for discarding outliers to clean up the data, before the application > of > > >>> statistical tests such as the means test or large sample test? > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA ^me^ at acm. org is my preferred e-mail ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
