Dennis Roberts writes:

>> There's a trade-off here. By removing the middle third, you increase
>> the separation of the two groups, which is good

> why is this good?

Think of a table with the legs close to the center. It is unstable. Push the
legs out a bit and it becomes more stable.

>> I would not be as critical as some of the others on the list. 
>> Sometimes a categorical variable is easier to interpret. A lot
>> of dietary research, for example, looks at the highest quintile
>> of fat consumption and compares it to the lowest quintile. I
>> can visualize those two groups pretty well.

> i would be interested in how this visualization would look ...
> in terms of fat consumption ... we can't really visualize can
> we, volume of fat very well ...

I visualize fat consumption fairly easily. The top quintile eats things like
pasta alfredo and the bottom quintile eats things like tofu.

Maybe a hypothetical example would help. Let's suppose we measure fat
consumption as the percentage of calories from fat.

For the sake of argument, let's say that the top quintile consumes 45% or
more of calories from fat and the bottom quintile consumes 15% or less of
calories from fat. I am interested in the probability of a heart attack in a
five year period. When we compare the top and bottom quintile and get an
odds ratio of 2.4, which tells us that people who consume 45% of the
calories from fat are much more likely to have a heart attack than people
who consume 15% or less of calories from fat. 

Compare that to the odds ratio computed on a continuous scale. It would
probably be around 1.03, which tells us that each extra percentage of
calories from fat in the diet will be associated with an increase of about
3% in the odds of a heart attack.

Both interpretations are reasonable, but I find the first one a bit easier
to visualize.

>> Furthermore, categorization mitigates some of the problems caused by
>> measurement error.

> how can this be? what if you want to estimate the true score for
> someone in the middle? you have eliminated all the middle data
> ... in a sense, the standard error of measurement has no meaning
> in this midrange anymore since, you estimate of it based on the
> top and bottom only groups ignores the middle

Suppose someone reports their fat consumption as 50% and it is actually 55%.
If you categorize the responses, the measurement error disappears entirely
for that person. A few people who are close to the boundary are
misclassified, so dichotomization doesn't completely solve the problem.

I heard Richard Peto give a talk on measurement error, which he called
dilution bias. A simple approach he suggested involved splitting the data
into thirds. Suppose you are looking at the effect of cholesterol on
mortality. Estimate the average cholesterol measure for the third of the
subjects with the lowest cholesterol values and do the same for the third of
the subjects with the highest cholesterol values. Compute the difference.
Say it is 100 units. If the odds ratio for mortality is 2.0, then you might
be tempted to say that a 100 unit decline in cholesterol will cause a
halving of mortality. This is equivalent to a 1.07 odds ratio for every 10
units of cholesterol.

But actually cholesterol measurements are notoriously unreliable. So what
you do is to get a repeat measurement of cholesterol in the lower third and
in the upper third. With regression to the mean, we know that the two groups
will be closer on the second measure than the first. Say there is only a 80
unit difference in the repeat measures. Then the odds ratio of 2.0 actually
tells us that an 80 unit decline in cholesterol will cause a halving of
mortality. This is equivalent to a 1.09 odds ratio for every 10 units of
cholesterol.

This fits well with your intuition, because measurement error tends to
attenuate the odds ratio.

Now this is not the only way, or necessarily the best way to adjust for
measurement error, but it is a simple way and it is very intuitive.

Hey, I'm 90% in agreement with what everyone else has said. If power and
precision are the only considerations then there is no justification for
dichotomization. But when other considerations enter in, dichotomization is
not uniformly bad.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
The STATS web page has moved to
http://www.childrens-mercy.org/stats.


.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to