RE: AI-GEOSTATS: Mean

Ted Harding Sat, 30 Dec 2006 14:37:17 -0800

On 30-Dec-06 Digby Millikan wrote:
> Hello,
> I looked in M.David's '""'
> and he says 'the mean of a lognormal distribution tends to be
> overestimated by using the arithmetic mean' so he goes on to
> provide a formulae for estimating the mean which includes the
> variance. Reading this statement lead me to wonder what is 
> the mean? So I looked up the definition of 'mean' in my Cambridge
> Dictionary of Statisitics and it gives the formulae for the
> arithmetic mean. So what is the mean that Michel is referring to
> as compared to what the Cambridge dictionary of statistics is
> referring to? Does the mean Michel is referring to mean 'the most
> likely value'? We certainly do not use the arithmetic mean when
> we use lognormal kriging because we tend to overestimate values.
> I am a bit confused about the definition of mean, according to
> Michel, it is not the average, but our most likely value,
> is this correct?


The term "mean" (of a set of values), unless otherwise qualified
(e.g. geometric mean, harmonic mean) refers to the arithmetic mean
of the values. Median (and mode, the "most likely value") are not,
strictly speaking, means; though in many circumstances they can
be used as workable substitutes.

The mean of a distribution is the expected value of a number drawn
from that distribution. In effect, you can look on it as the arithmetic
mean of all possible values, with frequencies of occurrence given
by the distribution. The mean (or expectation) of a distribution is
always estimated without bias by the arithmetic mean of a sample
from that distribution. Hence there can be no "tendency to
overestimation" by the aritthmetic mean.

Therefore what M. David means is probably something else, and the
following is a possible (and indeed plausible) interpretation.

A variable which has a lognormal distribution is such that its
natural logarithm (to base e) has a normal distribution.
Equivalently, the exponential of a normally distributed variable
has a lognormal distribution. These are the *definition* of
"lognormal".

Now, the normal distribution occurring in the definition has,
of course, a mean and a variance as parameters which fix which
normal distribution it is.

If, therefore, Z has a normal distribution with mean mu and
variance V, then

  X = exp(Z)

has a lognormal distribution with paramaters mu and V which are
the mean and variance of the "underlying" normal distribution.

Conversely, Z = log(X) has a normal distribution N(mu,V).

Now: the expected value (the "real" mean) of X is given by

  exp(mu + V/2)

where, again, mu is the "underlying" mean. This is certainly
larger than mu (the mean of the underlying normal), being greater
than 1 +mu + V/2. So maybe trere is a confusion over which
mean is meant: the mean of Z, or the mean of X.

But there is also another possible confusion. Many people have
the impression that because X = exp(Z), if you estimate the
mean of Z (which is mu) you can recover the mean of X by
"antilog", i.e. exp(mu). Since log(X) is normal, its mean (mu)
is estimated without bias by the arithmetic mean of the values
of log(X) in a sample.

So[???] "the mean of X is exp(arithmetic mean of log(sample))."

But this is wrong, since you also need the extra factor exp(V/2).

While there is nothing wrong with using the arithmetic mean of X
to estimate the mean of its lognormal distribution (i.e. it is
unbiased, no tendency to over-estimate), what is true is that the
values of X which are above the mean will indeed include values
which deviate much further from the mean than values which are
below the mean -- the lognormal distribution is positively skewed
(has a long upper tail).

So here is another possible confusion of interpretation:
In taking the arithmetic mean of a lognormal sample, you will
sometimes get a result which a long way above the mean, compared
with results which are below the mean. That could also be an
interpretation of M. David's "the mean of a lognormal distribution
tends to be overestimated by using the arithmetic mean."

But this would not agree with the usual interpretation of "tends
to be overestimated", which would either be "the estimate has a
positive bias" (i.e. in the long run average gives a result which
is to large), or else "the estimate [while unbiased] is more often
above than the mean than below it".

It can't agree with the first, because the arithmetic mean is
unbiased. And it can't agree with the second, because in fact
the probability that the arithmetic mean lies above the mean
(the expectation of the distribution) is less than 1/2: more
results will lie below the mean than lie above the mean (as needed
indeed to compensate for those occasional very high values).

I can only conclude (without knowing M. David's "Geostatisitcal
Ore Reserve Estimation") that he is writing either in confusion
about correct statistical terminology, or in a loose way without
proper definition or explanation of what he means.

Hoping this helps!

And Happy New Year to all.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Dec-06                                       Time: 20:50:00
------------------------------ XFMail ------------------------------
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and 
"unsubscribe ai-geostats" in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
+ As a general service to list users, please remember to post a summary of any 
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/

RE: AI-GEOSTATS: Mean

Reply via email to