Jay Tanzman <[EMAIL PROTECTED]> wrote: : Dr. Harrell, please bottom post rather than top post. It makes the thread : easier to follow.
: I was responding to your question, "Is a piecewise flat relationship more : realistic than a linear one?" when I wrote, "Well, yeah, I would say so. If the : relationship is U-shaped, say, then re-coding a continuous predictor variable : into 5 categories, will provide a better fit. To which you are responding: : Frank E Harrell Jr wrote: :> :> Definitely not. Categorizing into 5 levels takes 4 degrees of freedom :> and still assumes a piecewise flat relationship. Regression splines :> handle U-shapes as well as other smooth shapes, with typically < 4 d.f., :> and they provide better fits. : All I am saying is that the piecewise flat relationship would provide a better : fit to a U-shaped relationship than a linear one would. I'm not arguing that : splines wouldn't be better still. As for the degrees of freedom issue, in : epidemiology, we often don't care, as our datasets often have tens of thousands : of cases. : -Jay OTOH, fewer df is more attractive for its parsimony, no? In that sense, I always prefer a nice 1 or 2 df function over lots of categories even with big N. If this really doesn't matter, there is admittedyl a point at which you can make enough categories to capture essentially all the information of the continuous variable. I think Sam Green has published some of this in the structural equation/factor analytic literature, but I'm sure the category folks like Agresti must have said something like this, too. Germane to the present discussion--which appears to be dichotomization-- making only two categories is well under the number necessary, and is fraught with all sorts of other problems. To name a few apart from the well-known loss of power from the cut itself, dichotomizing increases measurement error and creates measurement "illogic", e.g., if you dichotomize systolic blood pressure at 140, it implies that a patient with a SBP of 140 is more similar to a patient with a SBP of 190 than to a patient of 139. Most folks are aware of these things, but choose to dichotomize anyway believing it is a more conservative test of the hypothesis. What they might not be aware of is that if you dichotomize two continuous variabes and put them in the same model, you'll wind up with _spurious_ power. The frequency of falsely rejected nulls increases with the size of the correlation. Maxwell and Delaney published a very nice paper on this in Psychological Bulletin in the late 90s. Mike Babyak . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
