Hi People
Some possible clarification, re the contents of Michel David's chapter on the
lognormal.
As Ted says, a variable is 'lognormal' if the logaritm of the value is
Normal. In ordinary circumstances, we have a limited amount of sampling from a
very large population of potential samples. We generally wish to estimate
parameters of the population using statistics calculated from the limited
sampling.
The 'mean' discussed by Michel David is the arithmetic mean, the simple
average or centre of gravity of the values. For a set of samples, simply add
the numbers together and divide by n. This statistic is an unbiassed estimator
for the mean of the population no matter what the type of distribution followed
by the sample values.
If the samples come from a Normal distribution, the arithmetic mean of the
samples is the best possible estimator for the population mean.
If the samples do not come from a Normal distribution BUT the central limit
theorem applies AND you have enough samples, the arithmetic mean of the samples
is the best estimator of the population mean.
If the centrl limit theorem does not apply OR you do not have enough samples,
the arithmetic mean is not necessarily the best estimator for the population
mean. This is the case with the lognormal distribution, which does not conform
to the central limit theorem.
Sichel showed (in 1949) that the arithmetic mean of the samples was a poor
estimator for the mean of a lognormal population. A prominent statistician,
David Finney, did the same in 1941, Both derived the formula for converting the
mean and standard deviation of the logarithms to find the mean of the
'original' values. This is the formula quoted by Digby:
exp{mu + 0.5 * sigma^2)
Sichel expanded on this by showing that estimates of mu and sigma calculated
from the logarithm of the samples may be substituted for mu and sigma in the
above equation. If you have enough samples (more than 40) this will provide a
better estimator for the lognormal mean. Better means: maximum likelihood, more
efficient, 'closer to reality', with narrower confidence bounds. These bounds
will not be symmetrical, as in the Normal or CLT case. No amount of sampling is
sufficient to make that happen.
If you have less than 40 samples, the above formula gives a biassed estimate
(and confidence bounds). Sichel provided tables to correct this bias and
calculate the confidence bounds. If you do not have access to Sichel or
Finney's work, you might find my 1987 SAIMM paper useful. It can be downloaded
by following the publications link at
http://uk.geocities.com/drisobelclark/resume
By the way, Michel David's book gives the formula using natural logarithm and
exponential backtransform. However, the tables provided in the earlier editions
are actually for use with logarithms to the base 10!
Hope this helps
Isobel
[EMAIL PROTECTED] wrote:
On 30-Dec-06 Digby Millikan wrote:
> Hello,
> I looked in M.David's '""'
> and he says 'the mean of a lognormal distribution tends to be
> overestimated by using the arithmetic mean' so he goes on to
> provide a formulae for estimating the mean which includes the
> variance. Reading this statement lead me to wonder what is
> the mean? So I looked up the definition of 'mean' in my Cambridge
> Dictionary of Statisitics and it gives the formulae for the
> arithmetic mean. So what is the mean that Michel is referring to
> as compared to what the Cambridge dictionary of statistics is
> referring to? Does the mean Michel is referring to mean 'the most
> likely value'? We certainly do not use the arithmetic mean when
> we use lognormal kriging because we tend to overestimate values.
> I am a bit confused about the definition of mean, according to
> Michel, it is not the average, but our most likely value,
> is this correct?
The term "mean" (of a set of values), unless otherwise qualified
(e.g. geometric mean, harmonic mean) refers to the arithmetic mean
of the values. Median (and mode, the "most likely value") are not,
strictly speaking, means; though in many circumstances they can
be used as workable substitutes.
The mean of a distribution is the expected value of a number drawn
from that distribution. In effect, you can look on it as the arithmetic
mean of all possible values, with frequencies of occurrence given
by the distribution. The mean (or expectation) of a distribution is
always estimated without bias by the arithmetic mean of a sample
from that distribution. Hence there can be no "tendency to
overestimation" by the aritthmetic mean.
Therefore what M. David means is probably something else, and the
following is a possible (and indeed plausible) interpretation.
A variable which has a lognormal distribution is such that its
natural logarithm (to base e) has a normal distribution.
Equivalently, the exponential of a normally distributed variable
has a lognormal distribution. These are the *definition* of
"lognormal".
Now, the normal distribution occurring in the definition has,
of course, a mean and a variance as parameters which fix which
normal distribution it is.
If, therefore, Z has a normal distribution with mean mu and
variance V, then
X = exp(Z)
has a lognormal distribution with paramaters mu and V which are
the mean and variance of the "underlying" normal distribution.
Conversely, Z = log(X) has a normal distribution N(mu,V).
Now: the expected value (the "real" mean) of X is given by
exp(mu + V/2)
where, again, mu is the "underlying" mean. This is certainly
larger than mu (the mean of the underlying normal), being greater
than 1 +mu + V/2. So maybe trere is a confusion over which
mean is meant: the mean of Z, or the mean of X.
But there is also another possible confusion. Many people have
the impression that because X = exp(Z), if you estimate the
mean of Z (which is mu) you can recover the mean of X by
"antilog", i.e. exp(mu). Since log(X) is normal, its mean (mu)
is estimated without bias by the arithmetic mean of the values
of log(X) in a sample.
So[???] "the mean of X is exp(arithmetic mean of log(sample))."
But this is wrong, since you also need the extra factor exp(V/2).
While there is nothing wrong with using the arithmetic mean of X
to estimate the mean of its lognormal distribution (i.e. it is
unbiased, no tendency to over-estimate), what is true is that the
values of X which are above the mean will indeed include values
which deviate much further from the mean than values which are
below the mean -- the lognormal distribution is positively skewed
(has a long upper tail).
So here is another possible confusion of interpretation:
In taking the arithmetic mean of a lognormal sample, you will
sometimes get a result which a long way above the mean, compared
with results which are below the mean. That could also be an
interpretation of M. David's "the mean of a lognormal distribution
tends to be overestimated by using the arithmetic mean."
But this would not agree with the usual interpretation of "tends
to be overestimated", which would either be "the estimate has a
positive bias" (i.e. in the long run average gives a result which
is to large), or else "the estimate [while unbiased] is more often
above than the mean than below it".
It can't agree with the first, because the arithmetic mean is
unbiased. And it can't agree with the second, because in fact
the probability that the arithmetic mean lies above the mean
(the expectation of the distribution) is less than 1/2: more
results will lie below the mean than lie above the mean (as needed
indeed to compensate for those occasional very high values).
I can only conclude (without knowing M. David's "Geostatisitcal
Ore Reserve Estimation") that he is writing either in confusion
about correct statistical terminology, or in a loose way without
proper definition or explanation of what he means.
Hoping this helps!
And Happy New Year to all.
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)
Fax-to-email: +44 (0)870 094 0861
Date: 30-Dec-06 Time: 20:50:00
------------------------------ XFMail ------------------------------
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and
"unsubscribe ai-geostats" in the message body. DO NOT SEND
Subscribe/Unsubscribe requests to the list
+ As a general service to list users, please remember to post a summary of any
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/