On 30-Dec-06 Digby Millikan wrote: > Hello, > I looked in M.David's '""' > and he says 'the mean of a lognormal distribution tends to be > overestimated by using the arithmetic mean' so he goes on to > provide a formulae for estimating the mean which includes the > variance. Reading this statement lead me to wonder what is > the mean? So I looked up the definition of 'mean' in my Cambridge > Dictionary of Statisitics and it gives the formulae for the > arithmetic mean. So what is the mean that Michel is referring to > as compared to what the Cambridge dictionary of statistics is > referring to? Does the mean Michel is referring to mean 'the most > likely value'? We certainly do not use the arithmetic mean when > we use lognormal kriging because we tend to overestimate values. > I am a bit confused about the definition of mean, according to > Michel, it is not the average, but our most likely value, > is this correct?
The term "mean" (of a set of values), unless otherwise qualified (e.g. geometric mean, harmonic mean) refers to the arithmetic mean of the values. Median (and mode, the "most likely value") are not, strictly speaking, means; though in many circumstances they can be used as workable substitutes. The mean of a distribution is the expected value of a number drawn from that distribution. In effect, you can look on it as the arithmetic mean of all possible values, with frequencies of occurrence given by the distribution. The mean (or expectation) of a distribution is always estimated without bias by the arithmetic mean of a sample from that distribution. Hence there can be no "tendency to overestimation" by the aritthmetic mean. Therefore what M. David means is probably something else, and the following is a possible (and indeed plausible) interpretation. A variable which has a lognormal distribution is such that its natural logarithm (to base e) has a normal distribution. Equivalently, the exponential of a normally distributed variable has a lognormal distribution. These are the *definition* of "lognormal". Now, the normal distribution occurring in the definition has, of course, a mean and a variance as parameters which fix which normal distribution it is. If, therefore, Z has a normal distribution with mean mu and variance V, then X = exp(Z) has a lognormal distribution with paramaters mu and V which are the mean and variance of the "underlying" normal distribution. Conversely, Z = log(X) has a normal distribution N(mu,V). Now: the expected value (the "real" mean) of X is given by exp(mu + V/2) where, again, mu is the "underlying" mean. This is certainly larger than mu (the mean of the underlying normal), being greater than 1 +mu + V/2. So maybe trere is a confusion over which mean is meant: the mean of Z, or the mean of X. But there is also another possible confusion. Many people have the impression that because X = exp(Z), if you estimate the mean of Z (which is mu) you can recover the mean of X by "antilog", i.e. exp(mu). Since log(X) is normal, its mean (mu) is estimated without bias by the arithmetic mean of the values of log(X) in a sample. So[???] "the mean of X is exp(arithmetic mean of log(sample))." But this is wrong, since you also need the extra factor exp(V/2). While there is nothing wrong with using the arithmetic mean of X to estimate the mean of its lognormal distribution (i.e. it is unbiased, no tendency to over-estimate), what is true is that the values of X which are above the mean will indeed include values which deviate much further from the mean than values which are below the mean -- the lognormal distribution is positively skewed (has a long upper tail). So here is another possible confusion of interpretation: In taking the arithmetic mean of a lognormal sample, you will sometimes get a result which a long way above the mean, compared with results which are below the mean. That could also be an interpretation of M. David's "the mean of a lognormal distribution tends to be overestimated by using the arithmetic mean." But this would not agree with the usual interpretation of "tends to be overestimated", which would either be "the estimate has a positive bias" (i.e. in the long run average gives a result which is to large), or else "the estimate [while unbiased] is more often above than the mean than below it". It can't agree with the first, because the arithmetic mean is unbiased. And it can't agree with the second, because in fact the probability that the arithmetic mean lies above the mean (the expectation of the distribution) is less than 1/2: more results will lie below the mean than lie above the mean (as needed indeed to compensate for those occasional very high values). I can only conclude (without knowing M. David's "Geostatisitcal Ore Reserve Estimation") that he is writing either in confusion about correct statistical terminology, or in a loose way without proper definition or explanation of what he means. Hoping this helps! And Happy New Year to all. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 30-Dec-06 Time: 20:50:00 ------------------------------ XFMail ------------------------------ + + To post a message to the list, send it to [email protected] + To unsubscribe, send email to majordomo@ jrc.it with no subject and "unsubscribe ai-geostats" in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list + As a general service to list users, please remember to post a summary of any useful responses to your questions. + Support to the forum can be found at http://www.ai-geostats.org/
