At 03:24 PM 1/31/03 -0500, Wuensch, Karl L wrote:
i don't have any off hand recollection of specific articles but ... here are a few comments ... perhaps you already made them to your colleagueWhen interested in the relationship between two continuous variables, some researchers will dichotomize one of them prior to analysis. I generally discourage such dichotomization, but the practice is common. A colleague asked me today about the practice of dichotomizing by a median split (top half versus bottom half) versus the practice of using only the tails (bottom third versus top third, for example). I outlined my thoughts on this matter and noted that I vaguely recall having read an article or two on this matter long ago, but cannot put my finger on the article(s). Can any of you all?
1. i don't think that practice IS that common IF the variable is something akin to a continuous variable ...
2. why toss away information from the data ... ? it is similar to reporting scores on a class test ... as, above the mean or below the mean ... why would students put up with such a crude reporting system?
3. if you use top 1/3 and bottom 1/3 ... you are also throwing data away ... which is worse than just lowering the information value of it in #2
4. more about #2 ... if you dichotomize in this fashion ... you can't go in the other direction ... that is, if i tell you i am a score of "above the median" ... WHERE exactly will that put me? this is like collecting data in RANKED form first ... you can't go back to the original RAW score values
5. as to #3 .. the variance/sd values will be inflated ... is that good? here is an example
i generated some data from a ND 50,5 ... rounded the values ... sorted
c1 is the 60 values ... mean, sd, etc.
then i took top 1/3 and bottom 1/3 ... eliminated middle 1/3 ... c5 is that set ... note for one: sd is larger
the axing of the middle 1/3 makes the data set look more variable than it really is and ... you have lost 33% of your data set ... why is that a good idea?
IT IS NOT
you might also note that IF you were doing some "inferential" tests with the data ... your standard error of the mean goes UP ... partly because you have a smaller n but also, because you have a larger sd estimate ...
MTB > prin c1
Data Display
C1
39 40 40 42 42 42 43 43 44 44
44 44 45 45 46 46 47 47 47 47
47 48 48 49 49 49 49 49 50 50
50 50 50 50 50 50 51 51 51 51
52 52 53 53 53 53 54 54 54 54
54 56 56 56 57 57 58 59 61 62
MTB > desc c1
Descriptive Statistics: C1
Variable N Mean Median TrMean StDev SE Mean
C1 60 49.617 50.000 49.556 5.256 0.679
Variable Minimum Maximum Q1 Q3
C1 39.000 62.000 46.000 53.000
MTB > prin c5
Data Display
C5
39 40 40 42 42 42 43 43 44 44
44 44 45 45 46 46 47 47 47 47
51 52 52 53 53 53 53 54 54 54
54 54 56 56 56 57 57 58 59 61
62
MTB > desc c5
Descriptive Statistics: C5
Variable N Mean Median TrMean StDev SE Mean
C5 41 49.659 51.000 49.568 6.343 0.991
Variable Minimum Maximum Q1 Q3
C5 39.000 62.000 44.000 54.000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> http://core.ecu.edu/psyc/wuenschk/klw.htm <http://core.ecu.edu/psyc/wuenschk/klw.htm>. . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
. . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
