Dennis Roberts writes: >> There's a trade-off here. By removing the middle third, you increase >> the separation of the two groups, which is good
> why is this good? Think of a table with the legs close to the center. It is unstable. Push the legs out a bit and it becomes more stable. >> I would not be as critical as some of the others on the list. >> Sometimes a categorical variable is easier to interpret. A lot >> of dietary research, for example, looks at the highest quintile >> of fat consumption and compares it to the lowest quintile. I >> can visualize those two groups pretty well. > i would be interested in how this visualization would look ... > in terms of fat consumption ... we can't really visualize can > we, volume of fat very well ... I visualize fat consumption fairly easily. The top quintile eats things like pasta alfredo and the bottom quintile eats things like tofu. Maybe a hypothetical example would help. Let's suppose we measure fat consumption as the percentage of calories from fat. For the sake of argument, let's say that the top quintile consumes 45% or more of calories from fat and the bottom quintile consumes 15% or less of calories from fat. I am interested in the probability of a heart attack in a five year period. When we compare the top and bottom quintile and get an odds ratio of 2.4, which tells us that people who consume 45% of the calories from fat are much more likely to have a heart attack than people who consume 15% or less of calories from fat. Compare that to the odds ratio computed on a continuous scale. It would probably be around 1.03, which tells us that each extra percentage of calories from fat in the diet will be associated with an increase of about 3% in the odds of a heart attack. Both interpretations are reasonable, but I find the first one a bit easier to visualize. >> Furthermore, categorization mitigates some of the problems caused by >> measurement error. > how can this be? what if you want to estimate the true score for > someone in the middle? you have eliminated all the middle data > ... in a sense, the standard error of measurement has no meaning > in this midrange anymore since, you estimate of it based on the > top and bottom only groups ignores the middle Suppose someone reports their fat consumption as 50% and it is actually 55%. If you categorize the responses, the measurement error disappears entirely for that person. A few people who are close to the boundary are misclassified, so dichotomization doesn't completely solve the problem. I heard Richard Peto give a talk on measurement error, which he called dilution bias. A simple approach he suggested involved splitting the data into thirds. Suppose you are looking at the effect of cholesterol on mortality. Estimate the average cholesterol measure for the third of the subjects with the lowest cholesterol values and do the same for the third of the subjects with the highest cholesterol values. Compute the difference. Say it is 100 units. If the odds ratio for mortality is 2.0, then you might be tempted to say that a 100 unit decline in cholesterol will cause a halving of mortality. This is equivalent to a 1.07 odds ratio for every 10 units of cholesterol. But actually cholesterol measurements are notoriously unreliable. So what you do is to get a repeat measurement of cholesterol in the lower third and in the upper third. With regression to the mean, we know that the two groups will be closer on the second measure than the first. Say there is only a 80 unit difference in the repeat measures. Then the odds ratio of 2.0 actually tells us that an 80 unit decline in cholesterol will cause a halving of mortality. This is equivalent to a 1.09 odds ratio for every 10 units of cholesterol. This fits well with your intuition, because measurement error tends to attenuate the odds ratio. Now this is not the only way, or necessarily the best way to adjust for measurement error, but it is a simple way and it is very intuitive. Hey, I'm 90% in agreement with what everyone else has said. If power and precision are the only considerations then there is no justification for dichotomization. But when other considerations enter in, dichotomization is not uniformly bad. Steve Simon, [EMAIL PROTECTED], Standard Disclaimer. The STATS web page has moved to http://www.childrens-mercy.org/stats. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
