wuzzy wrote:
> 
> How do you dichotomize a continuous variable, like age, into 3 decades
> of life.
> And is this a good thing to do in linear regression?

        No. Reasons:

        (a) by clustering into decades in *any* way you throw away significant
information [I would say roughly 10% in this case, based on squared
variation, but this is a handwave.]

        (b) by dichotomizing ("ISUNDER10={0,1}", "ISTEEN={0,1}") rather that
trichotomizing ("DECADE={1,2,3}") you throw away further information
that teenagers lies *between* kids & adults in age. 
 
> Is it just
> 
> V1:  ages0-10=1 else =0
> V2:  ages11-20=1 else =0
> v3:  ages21-30=1 else =0

        This is definitely wrong as the third variable is redundant (V3 =
1-V1-V2); this will mess up primitive regression software & get you
snickered at by more advanced routines.

> I want to do this because I think this would reduce my variation
> within groups,(measurement error)

        You may (mind you, I said "may") get a lower standard error; however,
this is illusory and more than balanced by the reduction in numbers. If
r^2 is big, the conversion of genuine age variation within decades into
apparent "error" [=variation not accounted for by the model] may
outweigh this.

                also I think that the effect of the
> different decades is not linear within the groups.

        If this is the case you would do better to consider one of the
following:

        (a) find a transformation that linearizes the effect
        (b) fit a polynomial model or other ad-hoc nonlinear model
        (c) use nonparametric techniques.

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to