"Donald F. Burrill" wrote:
> On Fri, 26 Nov 1999, Frank E Harrell Jr wrote:
>
> > Beware - you can't just anti-log the mean and s.d. The median
> > unlogged value is the antilog of the mean of the logged values.
>
> That's interesting. The antilog of the mean of log(X) is the geometric
> mean of X. Is the geometric mean necessarily also the median? I
> wouldn't have thought so, but haven't had occasion to consider the matter
> before now. I'll have to think about that some. (For one thing, the
> geometric mean, like the arithmetic mean, has a unique, well-defined
> value; the median needn't even exist and when it exists it needn't be
> unique, although there exist algorithms that generate a unique value,
> always by introducing an extra assumption or few, and mainly (one
> suspects) for the benefit of those who have a low tolerance for ambiguity.)
Yes, the geometric mean estimates the median of Y if Y has a log-normal
distribution. Beware of non-robustness of geometric mean though.
>
>
> > The mean unlogged value is something like exp(mean unlogged + .5sigma2)
> > where sigma2=sd of logged values.
>
> Did you mean "sigma2 = variance of logged values"? (Why else represent
> it as "sigma2" instead of just as "sigma"?)
> I take this sentence to mean
> X-bar = exp(X-bar + .5(var(logX))), approximately.
right
>
> (Or should I be thinking of the population mean of X instead of the
> sample mean?) Is this true in general, or only if X is actually
> lognormally distributed?
I'm inserting sample estimates for population parameters.
>
>
> > This is also a very assumption-laden approach (logarithm works, logged
> > values are normal with constant variance).
>
> Most approaches seem to entail assumptions of some kind. By "logarithm
> works" I take it you mean "X is everywhere positive"? Not a severe
> restriction for many kinds of data, I think. But it's not clear why one
> would assume "logged values are normal" if one were trying to find out
> whether the original data were lognormal? (And if the variance weren't
> constant, surely the values in question would not be normal?)
Right that's the bigger point. Who's to say that the log transformation is
likely to yield normality?
>
>
> > A more nonparametric approach based on smoothers and on Duan's smearing
> > estimator (to obtain mean on original scale) is worth investigating.
>
> Here as elsewhere, you seem to be aiming at seeking a value for the mean
> of X, X being the originally observed variable. If that were one's main
> interest, why bother to transform at all? The whole point (or at least
> one of the principal ones) of transforming is to end up with a variable
> that is more conveniently distributed than the one you started with; it
> would follow that estimates of the parameters of the transformed
> distribution would ordinarily be of more interest than those of the
> original, although sometimes the back-transformed estimates may have
> useful interpretations.
It is of interest to transform still, if doing regression modeling. -Frank
>
> -- DFB.
> ------------------------------------------------------------------------
> Donald F. Burrill [EMAIL PROTECTED]
> 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
> MSC #29, Plymouth, NH 03264 603-535-2597
> 184 Nashua Road, Bedford, NH 03110 603-471-7128
--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat