"Donald F. Burrill" wrote:

> On Fri, 26 Nov 1999, Frank E Harrell Jr wrote:
>
> > Beware - you can't just anti-log the mean and s.d.  The median
> > unlogged value is the antilog of the mean of the logged values.
>
> That's interesting.  The antilog of the mean of log(X) is the geometric
> mean of X.  Is the geometric mean necessarily also the median?  I
> wouldn't have thought so, but haven't had occasion to consider the matter
> before now.  I'll have to think about that some.  (For one thing, the
> geometric mean, like the arithmetic mean, has a unique, well-defined
> value;  the median needn't even exist and when it exists it needn't be
> unique, although there exist algorithms that generate a unique value,
> always by introducing an extra assumption or few, and mainly (one
> suspects) for the benefit of those who have a low tolerance for ambiguity.)

Yes, the geometric mean estimates the median of Y if Y has a log-normal
distribution.  Beware of non-robustness of geometric mean though.

>
>
> > The mean unlogged value is something like exp(mean unlogged + .5sigma2)
> > where sigma2=sd of logged values.
>
> Did you mean "sigma2 = variance of logged values"?  (Why else represent
> it as "sigma2" instead of just as "sigma"?)
> I take this sentence to mean
>         X-bar  =  exp(X-bar + .5(var(logX))), approximately.

right

>
>  (Or should I be thinking of the population mean of X instead of the
> sample mean?)  Is this true in general, or only if X is actually
> lognormally distributed?

I'm inserting sample estimates for population parameters.

>
>
> > This is also a very assumption-laden approach (logarithm works, logged
> > values are normal with constant variance).
>
> Most approaches seem to entail assumptions of some kind.  By "logarithm
> works" I take it you mean "X is everywhere positive"?  Not a severe
> restriction for many kinds of data, I think.  But it's not clear why one
> would assume "logged values are normal" if one were trying to find out
> whether the original data were lognormal?  (And if the variance weren't
> constant, surely the values in question would not be normal?)

Right that's the bigger point.  Who's to say that the log transformation is
likely to yield normality?

>
>
> > A more nonparametric approach based on smoothers and on Duan's smearing
> > estimator (to obtain mean on original scale) is worth investigating.
>
> Here as elsewhere, you seem to be aiming at seeking a value for the mean
> of X, X being the originally observed variable.  If that were one's main
> interest, why bother to transform at all?  The whole point (or at least
> one of the principal ones) of transforming is to end up with a variable
> that is more conveniently distributed than the one you started with;  it
> would follow that estimates of the parameters of the transformed
> distribution would ordinarily be of more interest than those of the
> original, although sometimes the back-transformed estimates may have
> useful interpretations.

It is of interest to transform still, if doing regression modeling.  -Frank

>
>                         -- DFB.
>  ------------------------------------------------------------------------
>  Donald F. Burrill                                 [EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264                                 603-535-2597
>  184 Nashua Road, Bedford, NH 03110                          603-471-7128

--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat

Reply via email to