Re: Data sample and log normal distribution

Donald F. Burrill Mon, 29 Nov 1999 23:50:51 -0800
On Fri, 26 Nov 1999, Frank E Harrell Jr wrote:

> Beware - you can't just anti-log the mean and s.d.  The median 
> unlogged value is the antilog of the mean of the logged values. 

That's interesting.  The antilog of the mean of log(X) is the geometric 
mean of X.  Is the geometric mean necessarily also the median?  I 
wouldn't have thought so, but haven't had occasion to consider the matter 
before now.  I'll have to think about that some.  (For one thing, the 
geometric mean, like the arithmetic mean, has a unique, well-defined 
value;  the median needn't even exist and when it exists it needn't be 
unique, although there exist algorithms that generate a unique value, 
always by introducing an extra assumption or few, and mainly (one 
suspects) for the benefit of those who have a low tolerance for ambiguity.) 

> The mean unlogged value is something like exp(mean unlogged + .5sigma2) 
> where sigma2=sd of logged values. 

Did you mean "sigma2 = variance of logged values"?  (Why else represent 
it as "sigma2" instead of just as "sigma"?)
I take this sentence to mean
        X-bar  =  exp(X-bar + .5(var(logX))), approximately. 
 (Or should I be thinking of the population mean of X instead of the 
sample mean?)  Is this true in general, or only if X is actually 
lognormally distributed? 

> This is also a very assumption-laden approach (logarithm works, logged 
> values are normal with constant variance). 

Most approaches seem to entail assumptions of some kind.  By "logarithm 
works" I take it you mean "X is everywhere positive"?  Not a severe 
restriction for many kinds of data, I think.  But it's not clear why one 
would assume "logged values are normal" if one were trying to find out 
whether the original data were lognormal?  (And if the variance weren't 
constant, surely the values in question would not be normal?)

> A more nonparametric approach based on smoothers and on Duan's smearing
> estimator (to obtain mean on original scale) is worth investigating. 

Here as elsewhere, you seem to be aiming at seeking a value for the mean 
of X, X being the originally observed variable.  If that were one's main 
interest, why bother to transform at all?  The whole point (or at least 
one of the principal ones) of transforming is to end up with a variable 
that is more conveniently distributed than the one you started with;  it 
would follow that estimates of the parameters of the transformed 
distribution would ordinarily be of more interest than those of the 
original, although sometimes the back-transformed estimates may have 
useful interpretations.
                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128
Re: Data sample and log normal distribution

Reply via email to