AI-GEOSTATS: RE: Mean of a lognormal

Isobel Clark Sat, 30 Dec 2006 16:25:16 -0800

Hi People
   
  Some possible clarification, re the contents of Michel David's chapter on the 
lognormal.
   
  As Ted says, a variable is 'lognormal' if the logaritm of the value is 
Normal. In ordinary circumstances, we have a limited amount of sampling from a 
very large population of potential samples. We generally wish to estimate 
parameters of the population using statistics calculated from the limited 
sampling.
   
  The 'mean' discussed by Michel David is the arithmetic mean, the simple 
average or centre of gravity of the values. For a set of samples, simply add 
the numbers together and divide by n. This statistic is an unbiassed estimator 
for the mean of the population no matter what the type of distribution followed 
by the sample values. 
   
  If the samples come from a Normal distribution, the arithmetic mean of the 
samples is the best possible estimator for the population mean.

  If the samples do not come from a Normal distribution BUT the central limit 
theorem applies AND you have enough samples, the arithmetic mean of the samples 
is the best estimator of the population mean.

  If the centrl limit theorem does not apply OR you do not have enough samples, 
the arithmetic mean is not necessarily the best estimator for the population 
mean. This is the case with the lognormal distribution, which does not conform 
to the central limit theorem.

  Sichel showed (in 1949) that the arithmetic mean of the samples was a poor 
estimator for the mean of a lognormal population. A prominent statistician, 
David Finney, did the same in 1941, Both derived the formula for converting the 
mean and standard deviation of the logarithms to find the mean of the 
'original' values. This is the formula quoted by Digby:

  exp{mu + 0.5 * sigma^2) 

  Sichel expanded on this by showing that estimates of mu and sigma calculated 
from the logarithm of the samples may be substituted for mu and sigma in the 
above equation. If you have enough samples (more than 40) this will provide a 
better estimator for the lognormal mean. Better means: maximum likelihood, more 
efficient, 'closer to reality', with narrower confidence bounds. These bounds 
will not be symmetrical, as in the Normal or CLT case. No amount of sampling is 
sufficient to make that happen.

  If you have less than 40 samples, the above formula gives a biassed estimate 
(and confidence bounds). Sichel provided tables to correct this bias and 
calculate the confidence bounds. If you do not have access to Sichel or 
Finney's work, you might find my 1987 SAIMM paper useful. It can be downloaded 
by following the publications link at 
http://uk.geocities.com/drisobelclark/resume

  By the way, Michel David's book gives the formula using natural logarithm and 
exponential backtransform. However, the tables provided in the earlier editions 
are actually for use with logarithms to the base 10!

  Hope this helps
  Isobel

[EMAIL PROTECTED] wrote:
  On 30-Dec-06 Digby Millikan wrote:
> Hello,
> I looked in M.David's '""'
> and he says 'the mean of a lognormal distribution tends to be
> overestimated by using the arithmetic mean' so he goes on to
> provide a formulae for estimating the mean which includes the
> variance. Reading this statement lead me to wonder what is 
> the mean? So I looked up the definition of 'mean' in my Cambridge
> Dictionary of Statisitics and it gives the formulae for the
> arithmetic mean. So what is the mean that Michel is referring to
> as compared to what the Cambridge dictionary of statistics is
> referring to? Does the mean Michel is referring to mean 'the most
> likely value'? We certainly do not use the arithmetic mean when
> we use lognormal kriging because we tend to overestimate values.
> I am a bit confused about the definition of mean, according to
> Michel, it is not the average, but our most likely value,
> is this correct?

The term "mean" (of a set of values), unless otherwise qualified
(e.g. geometric mean, harmonic mean) refers to the arithmetic mean
of the values. Median (and mode, the "most likely value") are not,
strictly speaking, means; though in many circumstances they can
be used as workable substitutes.

The mean of a distribution is the expected value of a number drawn
from that distribution. In effect, you can look on it as the arithmetic
mean of all possible values, with frequencies of occurrence given
by the distribution. The mean (or expectation) of a distribution is
always estimated without bias by the arithmetic mean of a sample
from that distribution. Hence there can be no "tendency to
overestimation" by the aritthmetic mean.

Therefore what M. David means is probably something else, and the
following is a possible (and indeed plausible) interpretation.

A variable which has a lognormal distribution is such that its
natural logarithm (to base e) has a normal distribution.
Equivalently, the exponential of a normally distributed variable
has a lognormal distribution. These are the *definition* of
"lognormal".

Now, the normal distribution occurring in the definition has,
of course, a mean and a variance as parameters which fix which
normal distribution it is.

If, therefore, Z has a normal distribution with mean mu and
variance V, then

X = exp(Z)

has a lognormal distribution with paramaters mu and V which are
the mean and variance of the "underlying" normal distribution.

Conversely, Z = log(X) has a normal distribution N(mu,V).

Now: the expected value (the "real" mean) of X is given by

exp(mu + V/2)

where, again, mu is the "underlying" mean. This is certainly
larger than mu (the mean of the underlying normal), being greater
than 1 +mu + V/2. So maybe trere is a confusion over which
mean is meant: the mean of Z, or the mean of X.

But there is also another possible confusion. Many people have
the impression that because X = exp(Z), if you estimate the
mean of Z (which is mu) you can recover the mean of X by
"antilog", i.e. exp(mu). Since log(X) is normal, its mean (mu)
is estimated without bias by the arithmetic mean of the values
of log(X) in a sample.

So[???] "the mean of X is exp(arithmetic mean of log(sample))."

But this is wrong, since you also need the extra factor exp(V/2).

While there is nothing wrong with using the arithmetic mean of X
to estimate the mean of its lognormal distribution (i.e. it is
unbiased, no tendency to over-estimate), what is true is that the
values of X which are above the mean will indeed include values
which deviate much further from the mean than values which are
below the mean -- the lognormal distribution is positively skewed
(has a long upper tail).

So here is another possible confusion of interpretation:
In taking the arithmetic mean of a lognormal sample, you will
sometimes get a result which a long way above the mean, compared
with results which are below the mean. That could also be an
interpretation of M. David's "the mean of a lognormal distribution
tends to be overestimated by using the arithmetic mean."

But this would not agree with the usual interpretation of "tends
to be overestimated", which would either be "the estimate has a
positive bias" (i.e. in the long run average gives a result which
is to large), or else "the estimate [while unbiased] is more often
above than the mean than below it".

It can't agree with the first, because the arithmetic mean is
unbiased. And it can't agree with the second, because in fact
the probability that the arithmetic mean lies above the mean
(the expectation of the distribution) is less than 1/2: more
results will lie below the mean than lie above the mean (as needed
indeed to compensate for those occasional very high values).

I can only conclude (without knowing M. David's "Geostatisitcal
Ore Reserve Estimation") that he is writing either in confusion
about correct statistical terminology, or in a loose way without
proper definition or explanation of what he means.

Hoping this helps!

And Happy New Year to all.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 30-Dec-06 Time: 20:50:00
------------------------------ XFMail ------------------------------
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and 
"unsubscribe ai-geostats" in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
+ As a general service to list users, please remember to post a summary of any 
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/

AI-GEOSTATS: RE: Mean of a lognormal

Reply via email to