> > though.  So outside about 14 sigmas you should be able to say the
> > probability is below 10e-40. The problem is that if there are small
> > deviations from "Gaussian-ness" way out on the wings of your distribution,
> > the REAL probability of a certain result is not well approximated by the
> > Error Function result.
> 
> This is a real good point - if we are assuming a Gaussian distribution, then
> we are assuming the best case. The worst case is given by Tchebycheff's
> theorem, which states that, given a probability distribution where only the
> mean and standard deviation is known, then the probability that an
> observation will fall more than x standard deviations from the mean is
> bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
> with a very unlikely distribution). In other words, if you have no
> guarantees about the distribution, "counting sigmas" is going to give you a
> false sense of security, and if the distribution is even slightly deviant
> from Gaussian, then the result can be very wrong indeed.

This is all fine - except the assertion about a Gaussian distribution 
being the best case.

Take a rectangular distribution - a continuous variable is equally 
likely to have any value between 0 and 1, but never less than 0 nor 
greater than 1.

This distribution has a mean of 0.5 and a standard deviation of 
0.289 (variance = 1/12), so _all_ observations taken from this 
distribution are well within 2 sigmas of the mean.

In fact, any distribution with negative kurtosis will be a better case 
than a Gaussian distribution in this respect, provided that it is 
symmetric (zero skew) and reasonably well behaved.

On numeric grounds, we should expect the distribution of the 
underlying errors to have a negative kurtosis. This is because of the 
finite precision of the result - the data points themselves are 
uncertain to an extent, e.g. if the data is accurate to only 4 bits 
precision then a "true" value of 0.1 could be recorded as either 
0.0625 or 0.1250. This effect tends to reduce the kurtosis (the 
central "hump" of the distribution is flattened and broadened - this 
causes an overestimation of the population standard deviation from 
a sample of observations, however large).


Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to