of course, if one has control over the data, checking the coding and making 
sure it is correct is a good thing to do

if you do not have control over that, then there may be very little you can 
do with it and in fact, you may be totally UNaware of an outlier problem

i see as a potentially MUCH larger problem when ONLY certain summary 
statistics are shown without any basic tallies/graphs displayed so, IF 
there are some really strange outlier values, it usually will go undetected ...

correlations are ONE good case in point ... have a look at the following 
scatterplot ... height in inches and weight in pounds ... from the pulse 
data set in minitab


          -  *
          -
       300+
          -
  Weight  -
          -                                                     2
          -                                                 2  224 32
       150+                                       *    * 3458*454322*
          -                                        *53*3*535  2
          -                                          **
            --+---------+---------+---------+---------+---------+----Height
           32.0      40.0      48.0      56.0      64.0      72.0

now, the actual r between the X and Y is -.075 ... and of course, this 
seems strange but, IF you had only seen this in a matrix of r values ... 
you might say that perhaps there was serious range restriction that more or 
less wiped out the r in this case ...  but even the desc. stats might not 
adequately tell you of this problem

IF you had the scatterplot, you probably would figure out REAL quick that 
there is a PROBLEM with one of the data points ...

in fact, without that one weird data point, the r is about .8 ... which 
makes a lot better sense when correlating heights and weights of college 
students


At 09:06 PM 2/25/02 +0000, Art Kendall wrote:

>--------------6F47CB3D3B10A10A3E9B064C
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
>An "outlier" is any value for a variable that is suspect given the
>measurement system, "common sense",  other values for the variable in
>the data set, or  the values a case has on other variables.
>=================================================================

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802
<Emailto: [EMAIL PROTECTED]>
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to