Re: Mersenne: Any statistics majors out there?

Chris Nash Sat, 8 May 1999 07:31:18 -0700
Hi George

> For this release, I thought it would be nice to use a more
> scientific approach.  What I've done is run almost 10000 iterations
> of an exponent and used Excel to graph the number of iterations with
> an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can
imagine
> you get a Bell-like curve that peaks at some value.
>
> Since it has been 25 years since I took statistics courses
> in college, I am at a loss as to how to use this graph to predict the
> probability that N iterations will have an error below M.  I could
> then use such information to find the exponent such that the chance
> of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some
> other criteria that we agree is "safe").

It's been a while since I took statistics too, but it is worth a try. What
you would need to do in theory is calculate the mean and variance of your
sample. Use those to estimate the mean and variance of the entire
distribution. The mean is easy, the sample mean is your estimator. The
variance is harder and is n/(n-1) times the sample variance. From that, we
square root and get an estimate of the standard deviation. Finally we need
to verify that 0.5 is "several" standard deviations from the mean before we
can state 0.5 won't occur with any degree of confidence.

How many is "several?". To calculate this, assume a Gaussian distribution.
The distribution function of a distribution with mean zero and standard
deviation 1 is f(x)=1/sqrt(2pi). exp(-x^2/2). The integral (the error
function) is not of a closed-form, but the trick is to multiply two error
functions in x and y, and integrate using polar coodinates. If we integrate
within a circle of radius R, then the integral is larger than the
probability both x and y are less than R/sqrt(2) in absolute value, but
smaller than the probability that they are both less than R in absolute
value. That integral evaluates as 1-exp(-R^2/2). Remember that this
probability applies to both x and y. So we need to square root it

P(-R<x<R)>sqrt(1-exp(-R^2/2))

We require this to hold for N iterations, with probability

(1-exp(-R^2/2))^(N/2), about 1-(N/2)exp(-R^2/2).

We need to equate the LHS to 1-10^-40 to get the result you need. R about
sqrt(2.(40 log 10+log(N/2))) will do. In other words, you need to check 0.5
is more than this number of standard deviations from the mean in order to
get the required probability.

I tried this with the data you suggested. I found the mean was over 36
standard deviations from the critical point 0.5 in the 1,345,000 case (so
probably OK), while only 22 standard deviations in the 1,360,000 case, while
the cutoff as computed above was about 14.52 for each. Of course, these
estimates for mean and standard deviation *also* need to be tested to ensure
that they are correct within probability 1-10^-40, this was a step I didn't
do, but no doubt its effect is to increase the value of R in the formula
above.

No doubt someone with a little more statistics will be able to explain how
these results are affected by the assumption of a Gaussian distribution, no
doubt 14.52 is best-case. Perhaps a cutoff of 20 standard deviations would
allow some tolerance.

Looks plausible, though.

Chris Nash
Lexington KY
UNITED STATES
=======================================================
Co-discoverer of the 7th and 10th largest known primes.


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?

Reply via email to