Re: Mersenne: Any statistics majors out there?

Todd Sauke Tue, 18 May 1999 18:09:05 -0700
George,

Thanks for sending me the raw data for the round-off error distribution.
The empirical data fit the cumulative distribution function (introduced by
Brian Beesley) reasonably well with the value "N" of several hundred to a
thousand, which matches my understanding of about how many round-offs each
of your points is the maximum of, so that seems to bode well.  Both the
theoretical curve and your empirical data (when plotted as one minus the
cumulative distribution function (on a log scale) versus round off error)
are nearly linear once you get past the "breaking point" (essentially the
peak of your distribution).  This means that you can use a VERY EASY, yet
decent, approximation to estimate the probability you seek, i.e. that there
will be no roundoff error greater than 0.5 in an LL test.  If you run a
test of 10,000 iterations and tabulate the histogram of maximum errors as
in your raw data, the probability you seek is ~:

P * 10^(-3 * (0.5 - A) / (B - A))

     where:

P = the Mersenne exponent under test
A = the peak of the distribution (the most frequent maximum round off error)
B = the TENTH largest round off error of the 10,000 iterations (i.e. where
1 minus the cumulative probability distribution is ~ 10^-3)

This estimate would appear to be appropriately conservative, since the
deviation from linearity is such that the real curve bends down to lower
probability than this linear estimate.

For your specific examples:

P = 1,345,000
A = 0.177
B = 0.245

Estimated probability of roundoff error for entire LL test ~ 8 E-9

And

P = 1,360,000
A = 0.250
B = 0.375

Estimated probability of roundoff error for entire LL test = 1.4

The "probability" being greater than one comes because the "small
probability" condition for the estimate is violated
[ 1 - (1 - e)^ P  ~  P * e   only for e << 1/P ]
We don't care about this regime, the error probability is far too great.

By the way, I bet that you could do interpolation/extrapolation (in Log
space) to determine the "cutoff" value without having to run the 10,000
iteration test for ALL candidates.

Log10 (Prob(1,345,000)) ~ -8
Log10 (Prob(1,600,000)) ~  0

So for example I bet that Probability of roundoff error for entire LL test for
P = 1,472,500  (halfway between exponents) is ~ 10-4 (logarithmically
halfway between probabilities).

You probably already have the raw data to test this out.

I hope this can let you SIMPLY estimate the relevant probabilities with
some degree of confidence in the estimates (pseudo rigorous).

Sincerely,

Todd Sauke
ps. To avoid the "small probability" requirement in the probability
estimate you could just use the equation [ 1 - (1 - 10^(-3 * (0.5 - A) / (B
- A)))^ P ] but DON'T use this if you want to do the logarithmic
interpolation using any numbers that come close to (or over) 1.  The
"incorrect" estimate is what works best for the logarithmic interpolation!
TS


>George,
>
>Given the uncertainty about the actual probability distribution function
>involved it seems that the best way to estimate the probability of having
>no roundoff greater than 0.5 in one iteration is to simply plot one minus
>your empirical cumulative distribution function (on a log scale) versus
>round off error on a scale that goes out to 0.5, and extrapolate by
>"eyeball" to see where the curve would hit 0.5 and if it would be below
>1e-10 or 1e-15 or whatever.  Wherever the curve hits (or appears it would
>hit) 0.5, times the number of needed LL iterations (p), gives an empirical
>estimate of the probability that there will be no roundoff greater than 0.5
>(for small values of probability, of course).
>
>It is likely that some (few?) individual people who are "into" this could
>use the raw Excel data numbers, not just the plots you pointed us to (which
>I still can't make my computer see.)  I would appreciate it if you could
>ship me one or two files with probability histograms (or occurrance density
>or whatever) versus roundoff value that could be loaded into Excel 4.0 for
>Mac.
>
>It might be interesting to see if the shape of this log-lin plot follows
>the one given by the cumulative distribution function a-la Brian Beesley.
>This function seems to have one degree of freedom, the number of rounded
>off values of which the maximum one gives your function, which in his
>example he chose to be 100. You probably know from the code what that
>number should be.  (Another degree of freedom is the underlying uncertainty
>or the "scaling factor" for the x axis.)  You might also know from the code
>what the "underlying" sigma should be under the assumption that it is
>something like (sigma0 * SquareRoot(n)) where n is the number of roundings
>that occur in the cumulative calculation and sigma0 is the typical
>"elementary" round-off error (scaled properly to be in the same units as
>the 0.5 critical roundoff limit).  If this is not available a-priori from
>the code, it would still be interesting to estimate it by finding the best
>fit to your empirical cumulative distribution function.  If the resulting
>smooth cumulative distribution function is a good match to the empirical
>one it could serve as a guide for better extrapolation (or instead of the
>extrapolation).  If it is not a good match it would serve as a "warning"
>that there may be "funny" things going on with the errors (particularly
>worrysome if the "tail" is trending up, implying larger errors are more
>likely than indicated by theory).
>
>Todd Sauke
>________________________________________________________________
>Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?

Reply via email to