sci.stat.edu  people: There have been other Replies to the original
post,  in sci.stat.math.

On 21 Apr 2004 09:10:14 -0500, [EMAIL PROTECTED] (Herman
Rubin) wrote:

> In article <[EMAIL PROTECTED]>, S Fan <[EMAIL PROTECTED]> wrote:
> >If you plot the residuals, the residuals seem getting bigger (or
> >smaller), then you may need transformation. 
> >When doing regression, one assumption is that the data follow a
> >constant (though unknown) sigma. 
> >Hope it helps.
> >S Fan 
> 
> This is NOT the most important assumption; one can modify
> the regression approach to take it into account.  The MOST
> important assumption is that the relationship is a linear
> relationship, with the "errors" independent of (or at least
> uncorrelated with) the predictors.  Non-trivial transformations
> are extremely unlikely to preserve this property.

Herman is accustomed to data that *have* these properties of
linearity and independent errors at the start.  He is also facile
with non-linear analyses where he knows how to  accommodate 
the error structure directly -- something not always easy to do, and
sometimes easier to do than to *explain*  to an audience which
is not sophisticated with numbers.

My experience is different from his.  In clinical research, bioassays 
(for one instance) have unit 'concentrations'  but the proper unit 
of measurement, with those properties he mentions, is apt to be the
log().  The proper unit of the growth curve is apt to be the logit.
Bioassay is an area with a long and healthy tradition of 
transformations; check any textbook.

Tukey provided a rule of thumb for data with natural zero:  IF the
largest value is 10 or 20 times the smallest, then you probably
want to transform.  Tukey also provided other guidelines, 
talking about 'folded' transformations such as the logit, and 
about the family of power transformations.

Some people are fond of the rank-transformation:  That is the
useful way, in my opinion, of referring a large fraction of the
'non-parametric' alternatives, which I avoid when I can.  

Finally, some people like arbitrary transformations, including
adding arbitrary constants before taking the log or power:
What I am thinking of are the ones with the single virtue
of giving residuals that are apparently normal, for the data
on hand -- That is done in order to improve (or justify) using
the F-test.  The proper p-level is not achieved if you do not 
meet the assumption about residuals, so this DOES THAT.
I can admit that I did that a time or two, a long time ago,
and I might someday do it again.  
However, the F-test will be more simply wrong, if, say,
the linearity is fouled up by the transformation, making the 
coefficients wrong and mis-measuring the error.  I don't know
if I avoid 'arbitrary transformations'  because of that, or because
they are inelegant and hard to justify to anyone else.

> 
> >On 19 Apr 04 03:24:58 -0400 (EDT), opaow wrote:
> >>Hi.I am just quite confused about data transformations (specially in
> >>doing ANOVA and Regression)... When and why do we transform data?...
> >>Any help??? I'm not quite good at it.....thanks in advance..


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to