On Fri, 26 Nov 1999, Frank E Harrell Jr wrote:
> Beware - you can't just anti-log the mean and s.d. The median
> unlogged value is the antilog of the mean of the logged values.
That's interesting. The antilog of the mean of log(X) is the geometric
mean of X. Is the geometric mean necessarily also the median? I
wouldn't have thought so, but haven't had occasion to consider the matter
before now. I'll have to think about that some. (For one thing, the
geometric mean, like the arithmetic mean, has a unique, well-defined
value; the median needn't even exist and when it exists it needn't be
unique, although there exist algorithms that generate a unique value,
always by introducing an extra assumption or few, and mainly (one
suspects) for the benefit of those who have a low tolerance for ambiguity.)
> The mean unlogged value is something like exp(mean unlogged + .5sigma2)
> where sigma2=sd of logged values.
Did you mean "sigma2 = variance of logged values"? (Why else represent
it as "sigma2" instead of just as "sigma"?)
I take this sentence to mean
X-bar = exp(X-bar + .5(var(logX))), approximately.
(Or should I be thinking of the population mean of X instead of the
sample mean?) Is this true in general, or only if X is actually
lognormally distributed?
> This is also a very assumption-laden approach (logarithm works, logged
> values are normal with constant variance).
Most approaches seem to entail assumptions of some kind. By "logarithm
works" I take it you mean "X is everywhere positive"? Not a severe
restriction for many kinds of data, I think. But it's not clear why one
would assume "logged values are normal" if one were trying to find out
whether the original data were lognormal? (And if the variance weren't
constant, surely the values in question would not be normal?)
> A more nonparametric approach based on smoothers and on Duan's smearing
> estimator (to obtain mean on original scale) is worth investigating.
Here as elsewhere, you seem to be aiming at seeking a value for the mean
of X, X being the originally observed variable. If that were one's main
interest, why bother to transform at all? The whole point (or at least
one of the principal ones) of transforming is to end up with a variable
that is more conveniently distributed than the one you started with; it
would follow that estimates of the parameters of the transformed
distribution would ordinarily be of more interest than those of the
original, although sometimes the back-transformed estimates may have
useful interpretations.
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128