Mersenne Digest          Monday, May 10 1999          Volume 01 : Number 554




----------------------------------------------------------------------

Date: Sat, 08 May 1999 12:00:55 -0400
From: Chris Nash <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

> though.  So outside about 14 sigmas you should be able to say the
> probability is below 10e-40. The problem is that if there are small
> deviations from "Gaussian-ness" way out on the wings of your distribution,
> the REAL probability of a certain result is not well approximated by the
> Error Function result.

This is a real good point - if we are assuming a Gaussian distribution, then
we are assuming the best case. The worst case is given by Tchebycheff's
theorem, which states that, given a probability distribution where only the
mean and standard deviation is known, then the probability that an
observation will fall more than x standard deviations from the mean is
bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
with a very unlikely distribution). In other words, if you have no
guarantees about the distribution, "counting sigmas" is going to give you a
false sense of security, and if the distribution is even slightly deviant
from Gaussian, then the result can be very wrong indeed.

However we do have a little comfort. George's function - maximum convolution
error from a single iteration - does have some distribution. It "appears"
bell-shaped, and looks like a normal distribution, the samples George has
made are also reasonably-sized. Not only that, each observation is actually
the maximum of the deviations from integer over many channels in the FFT -
in other words, the observations are already coming from a "smoothed"
distribution. I'm sure the distribution of a maximum of several observations
is something which there is a statistical test for. (The resulting
distribution may not be normal, but may have an analogous test).

Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that
I got the hard way :)

Chris


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sat, 08 May 1999 18:17:48 -0500
From: Ken Kriesel <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

Aren't gaussians symmetric about the mean value?  What George plotted is not.


At 12:00 PM 1999/05/08 -0400, you wrote:
>
>> though.  So outside about 14 sigmas you should be able to say the
>> probability is below 10e-40. The problem is that if there are small
>> deviations from "Gaussian-ness" way out on the wings of your distribution,
>> the REAL probability of a certain result is not well approximated by the
>> Error Function result.
>
>This is a real good point - if we are assuming a Gaussian distribution, then
>we are assuming the best case. The worst case is given by Tchebycheff's
>theorem, which states that, given a probability distribution where only the
>mean and standard deviation is known, then the probability that an
>observation will fall more than x standard deviations from the mean is
>bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
>with a very unlikely distribution). In other words, if you have no
>guarantees about the distribution, "counting sigmas" is going to give you a
>false sense of security, and if the distribution is even slightly deviant
>from Gaussian, then the result can be very wrong indeed.
>
>However we do have a little comfort. George's function - maximum convolution
>error from a single iteration - does have some distribution. It "appears"
>bell-shaped, and looks like a normal distribution, the samples George has
>made are also reasonably-sized. Not only that, each observation is actually
>the maximum of the deviations from integer over many channels in the FFT -
>in other words, the observations are already coming from a "smoothed"
>distribution. I'm sure the distribution of a maximum of several observations
>is something which there is a statistical test for. (The resulting
>distribution may not be normal, but may have an analogous test).
>
>Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that
>I got the hard way :)
>
>Chris
>
>
>________________________________________________________________
>Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
>
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sat, 08 May 1999 20:55:17 -0400
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

At 12:00 PM 5/8/99 -0400, Chris Nash wrote:

>This is a real good point - if we are assuming a Gaussian distribution, then
>we are assuming the best case. The worst case is given by Tchebycheff's
>theorem, which states that, given a probability distribution where only the
>mean and standard deviation is known, then the probability that an
>observation will fall more than x standard deviations from the mean is
>bounded above *only* by 1/x^2. 

Yes, but that is a really bad worst case.  Pathological even.  Looking at the
graphs of the two data sets, you can tell that it is reasonably well behaved.


+-------------------------------------------+
| Jud McCranie  [EMAIL PROTECTED] |
|                                           |
| I have macular stars in my eyes.          |
+-------------------------------------------+


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sat, 08 May 1999 21:01:06 -0400
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

At 06:17 PM 5/8/99 -0500, Ken Kriesel wrote:
>Aren't gaussians symmetric about the mean value?  What George plotted is not.


Yes, but isn't too far off.  But it does drop off quite a bit faster onthe
left.  That is why I think it may be better to chop the data at the mean, throw
away that under the mean, reflect the other data with respect to the mean, and
calculate the standard deviation from that "massaged" data.  Then a Normal
distribution should approximate it pretty well.


+-------------------------------------------+
| Jud McCranie  [EMAIL PROTECTED] |
|                                           |
| I have macular stars in my eyes.          |
+-------------------------------------------+


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sun, 9 May 1999 08:45:11 -0000
From: [EMAIL PROTECTED]
Subject: Re: Mersenne: Any statistics majors out there?

Hi,

I have an honours degree in mathematics with a fair amount of
statistics content; unfortunately I haven't used the statistics
much since I graduated in 1975, so feel free to shoot down what
I have to say if you think I'm wrong.

Firstly, what we are measuring is the *maximum* value from a
sequence of observations; therefore we would not expect the
observed distribution to be normal, even if the underlying
distribution were. I would expect the distribution to have a
positive skew (i.e. to be "squeezed" on the side below the
mean, and "stretched" on the side above the mean).

As an example, I'm going to stick to numbers small enough to
check easily. The probabilities come from a standard set of
tables of the normal distribution, to four significant figures.

For _any_ distribution, if the probability that a random variable
is less than x is p, then the probability that all of a set of
N random variables from the same distribution are all less than
x is p^N, i.e. the probability that at least one of the N random
variables will be greater than or equal to x is 1 - p^N

Suppose that the random variables come from a normal distribution
with mean 0 and standard deviation 1. However we take the absolute
value of the variables - the distribution is symmetric about zero,
so this means that we are dealing with twice the integral from 0
to +infinity instead of the integral from -infinity to +infinity.

We take the maximum of 100 random variables drawn from this "half-
normal" distribution and wish to look at the distribution of the
new variable represented by this maximum function.

The following tabulated values should illustrate the shape of
the distribution. The left hand column is the value of x, the second
column is the value of the cumulative distribution function of the
normal distribution (from standard stats tables), the third value
is the corrected CDF for the upper half of the distribution, and the
fourth value is the CDF of the distribution of the maxima of 100
observations.

1.0   0.8413   0.6826   0.000000
1.1   0.8643   0.7286   0.000000
1.2   0.8849   0.7698   0.000000
1.3   0.9032   0.8064   0.000000
1.4   0.9192   0.8384   0.000000
1.5   0.9332   0.8664   0.000001
1.6   0.9452   0.8904   0.000009
1.7   0.9554   0.9108   0.000088
1.8   0.9641   0.9282   0.000581
1.9   0.9713   0.9426   0.002709
2.0   0.9772   0.9544   0.009398
2.1   0.9821   0.9642   0.026104
2.2   0.9861   0.9722   0.059643
2.3   0.9893   0.9786   0.114953
2.4   0.9918   0.9836   0.191360
2.5   0.9938   0.9876   0.287150
2.6   0.9953   0.9906   0.388895
2.7   0.9965   0.9930   0.495364
2.8   0.9974   0.9948   0.593715
2.9   0.9981   0.9962   0.683367
3.0   0.9986   0.9972   0.755487
3.1   0.9990   0.9980   0.818567
3.2   0.9993   0.9986   0.869273
3.3   0.9995   0.9990   0.904792
3.4   0.9997   0.9994   0.941748
3.5   0.9998   0.9996   0.960782

.. at which point the four digit precision of the tables becomes
insufficient to make it worthwhile tabulating more data.

But I think it's clear that the CDF rises more slowly on the side
above the median (just above 2.7) than it does on the lower side.

(Sorry I chose 100 now - there is an unfortunate coincidence between
"just above 2.7" and e, which is accidental - in fact the median is
more accurately 2.705)

What does all this mean? Unfortunately I can't see much, except that
given a value which is the maximum of a set of 100 data values drawn
from a population of positive real values, we can (swallowing hard
and trusting in the central limit theorem) deduce that the standard
deviation of the underlying distribution is about 1/2.7 of the given
value.

Perhaps this is therefore the wrong angle to attack the problem from.

....................................................................

Could I suggest that the maximum convolution error is likely to occur
when whole blocks of bits are 1, thus forcing large integers into the
transform so that rounding/truncation errors are most likely to occur.

Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1)

If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values
of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore
2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive
1 bits.

So running a couple of iterations of the LL test algorithm starting
at 1 (instead of 4), looking at the max convolution error and checking
that the value of the residual after 2 iterations is 1, might be
instructive ... ?

Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sun, 09 May 1999 12:40:19 +0200
From: Guillermo Ballester Valor <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

Hi to all:

George Woltman wrote:

> 
> P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
> is done in floating point arithmetic, round-off errors build up.  After the
> FFT, every value is rounded back to an integer.  The convolution error is
> the largest amount that a result value differs from the correct integer
> value.  If the convolution error exceeds 0.5, then the round-back-to-integer
> step will round to the wrong integer value - a catastrophic failure.
> 
> Another way to look at this:  Each FFT value contains about 20 bits of data.
> That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
> If each FFT value was close to the 2^19 value, then each result value would
> need 19+19+17=55 bits to hold the result.  Since a float only holds 53 bits
> there would be a loss of information.  Fortunately, probability assures us
> that it is extremely unlikely that every (or even most) FFT values will
> be close to the 2^19 maximum value.
> 

I'm not sure I've understood that correctly. FFT's is a good method to
evaluate convolution products in O(n log n) time instead of O (n*n).
Well, let M(p) prime, thus if the first term in L-L test L(0)=4, we will
get L(p-2)=0 mod M(p). And we also get L(p-3)*L(p-3)=2 mod M(p) which
have two solutions. One is 2^((p+1)/2), and in its binary expansion we
only have one '1' and all remaining bits are 0. The last convolution
would be very easy.

 *BUT* what if L(p-3) is -2^((p+1)/2) ?.  
All bits in L(p-3) are '1' except only one '0'. If we perform a
convolution for L(p-3)*L(p-3) we can have problems if we have no bits
enough. MURPHY THEOREM: We will fault in last iteration and we will let
go a big prime... and 50000$. :-(    


Likely, internal representation in each iteration are no the bits of
binary expansion in float form, (I haven't studied in detail the FFT
George uses), so don't worry. ;-)
 

             
- ----------------------------------
| Guillermo Ballester Valor       |  
| [EMAIL PROTECTED]                      |  
| c/ cordoba, 19                  |
| 18151-Ogijares (Spain)          |
- ----------------------------------
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sun, 9 May 1999 11:45:51 -0300 (EST)
From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

On Fri, 7 May 1999, George Woltman wrote:

> Hi all,
> 
>       I'm working on version 19 of prime95 and I need your help.
> In the past, the exponents at which a larger FFT is used was picked
> rather haphazardly.  I simply picked a few exponents trying to find
> one that could run a thousand or so iterations without the convolution
> error greatly exceeding 0.3.
> 
>       For this release, I thought it would be nice to use a more
> scientific approach.  What I've done is run almost 10000 iterations
> of an exponent and used Excel to graph the number of iterations with 
> an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
> you get a Bell-like curve that peaks at some value.
> 
...
> 
> P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
> is done in floating point arithmetic, round-off errors build up.  After the
> FFT, every value is rounded back to an integer.  The convolution error is
> the largest amount that a result value differs from the correct integer
> value.  If the convolution error exceeds 0.5, then the round-back-to-integer
> step will round to the wrong integer value - a catastrophic failure.
> 
> Another way to look at this:  Each FFT value contains about 20 bits of data.
> That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
> If each FFT value was close to the 2^19 value, then each result value would
> need 19+19+17=55 bits to hold the result.  Since a float only holds 53 bits
> there would be a loss of information.  Fortunately, probability assures us
> that it is extremely unlikely that every (or even most) FFT values will
> be close to the 2^19 maximum value.

Maybe my approach is completely misguided, but I would prefer to look at
the algorithm/source code and try to prove something. What are the
relevant parts of the source code? Is this part written in some
language which is more or less easy for a human to understand
(such as C instead of assembly)?

Also, I would like to make sure I understand the arithmetic in the last
paragraph. You say the FFT values contain 20 bits of data each and that
they range from 2^19 to -2^19: I assume this means that they are
integers with 19 digits base 2, plus one bit for the sign.
But what are these integers? Surely not an instance of the Lucas-Lehmer
sequence mod p, or they would have as many bits as p, which is around
23 for the values of p being currently tested.
I also assume that 17 comes from 128K = 2^17, but why do we need
19+19+17 bits to hold the result?

By the way, this is a somewhat dumb question, but is even an error of .499
considered ok (when rounding to the nearest integer, I mean)? Probably
not... what is the maximum tolerated error?

Nicolau

http://www.mat.puc-rio.br/~nicolau


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sun, 9 May 1999 12:38:40 -0300 (EST)
From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

On Sun, 9 May 1999, I wrote:

> Also, I would like to make sure I understand the arithmetic in the last
> paragraph. You say the FFT values contain 20 bits of data each and that
> they range from 2^19 to -2^19: I assume this means that they are
> integers with 19 digits base 2, plus one bit for the sign.
> But what are these integers? Surely not an instance of the Lucas-Lehmer
> sequence mod p, or they would have as many bits as p, which is around  
> 23 for the values of p being currently tested.

That was a stupid possibility, of course. It is the *exponents*
which have approx 23 bits and the LL sequence must be computed
modulo the *Mersenne number*, which has millions of bits.
But I still do not understand what exactly has 20 bits.
Perhaps a term of the LL sequence is written in base 2^20,
so that each "digit" is a block of 20 bits?

Nicolau

http://www.mat.puc-rio.br/~nicolau

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Sun, 09 May 1999 19:03:12 EDT
From: "Foghorn Leghorn" <[EMAIL PROTECTED]>
Subject: Mersenne: P587 factored?

I noticed that P587 is no longer on Geroge's ECM factoring page, but there 
is no factor listed. Did someone else find a factor?


_______________________________________________________________
Get Free Email and Do More On The Web. Visit http://www.msn.com
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 03:47:10 +0200 (MET DST)
From: [EMAIL PROTECTED]
Subject: Re:  Mersenne: P587 factored?

Observant "Foghorn Leghorn" <[EMAIL PROTECTED]> asks:

> I noticed that P587 is no longer on Geroge's ECM factoring page, but there 
> is no factor listed. Did someone else find a factor?

     Robert Silverman sieved it by SNFS.  CWI did the linear algebra.

C(2,587+)
  2769467  13119770765051463547
* C151 = prp71 * prp80.   SNFS Silverman/CWI
  58304599029582814346174509784805811323612591021863318902938306546239131
  79700481573792089991933884278502015980406487362166583968208410611454698961420697


Conrad Curry is recruiting volunteers for M617 by SNFS.
CWI is doing 2,1186L and 2,1198L.
[2,1198L denotes 2^599 - 2^300 + 1, a divisor of 2^1198 + 1.]
These will complete 2,n+- for n <= 600 and 2,nLM for n <= 1200.
Under ten composite b^n +- 1 with b <= 12 and b^n <= 10^180
remain in the Cunningham table.


________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 10:37:34 GMT
From: "Brian J Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

Yesterday I wrote
> 
> Could I suggest that the maximum convolution error is likely to occur
> when whole blocks of bits are 1, thus forcing large integers into the
> transform so that rounding/truncation errors are most likely to occur.
> 
> Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1)
> 
> If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values

Aaarrghhh! I think I must have consumed too much Old Bushmills 
brain solvent ... x = 1, x' = x'' = x''' = -1 ... 8*)

> of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore
> 2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive
> 1 bits.

Fortunately my error does not affect this argument.
> 
> So running a couple of iterations of the LL test algorithm starting
> at 1 (instead of 4), looking at the max convolution error and checking
> that the value of the residual after 2 iterations is 1, might be
> instructive ... ?

Well, I tried it using a modified version of Richard Crandall's generic 
C program lucdwt. Starting with 1 instead of 4 gives the correct 
residual 0xFFFFFFFFFFFFFFFE (-1 mod (2^p-1)) for all iterations 
and with _zero_ error, provided the transform size > p/32.

So, almost all zero bits and almost all one bits are both very easy 
on the transform.

Is it possible to generate "worst case" data, and, if so, would 
anyone care to suggest what it might be? e.g. I remember we used 
to fill space with 0xE5 on FM disks, or 0x6DB6 on MFM disks, 
because these bit patterns were particularly difficult for the 
hardware to cope with - thus, if the test pattern was reliable, the 
data should be safe.

Bearing in mind the strong links between signal processing and 
Fourier transforms, this illustration may possibly be not entirely 
without relevance.

My feeling is that running the LL test itself, starting at 4, is 
probably as effective a way as any of pseudo-randomly generating 
bit patterns for the purpose of testing the FFT logic, unless 
someone can demonstrate a better method. (Of course it takes a 
few iterations for the mod 2^p-1 to "kick in" and start to 
"randomize" the bit pattern - but after less than 50 iterations the 
initial conditions have been well and truly obfuscated)



Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 11:06:27 GMT
From: "Brian J Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

> > though.  So outside about 14 sigmas you should be able to say the
> > probability is below 10e-40. The problem is that if there are small
> > deviations from "Gaussian-ness" way out on the wings of your distribution,
> > the REAL probability of a certain result is not well approximated by the
> > Error Function result.
> 
> This is a real good point - if we are assuming a Gaussian distribution, then
> we are assuming the best case. The worst case is given by Tchebycheff's
> theorem, which states that, given a probability distribution where only the
> mean and standard deviation is known, then the probability that an
> observation will fall more than x standard deviations from the mean is
> bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
> with a very unlikely distribution). In other words, if you have no
> guarantees about the distribution, "counting sigmas" is going to give you a
> false sense of security, and if the distribution is even slightly deviant
> from Gaussian, then the result can be very wrong indeed.

This is all fine - except the assertion about a Gaussian distribution 
being the best case.

Take a rectangular distribution - a continuous variable is equally 
likely to have any value between 0 and 1, but never less than 0 nor 
greater than 1.

This distribution has a mean of 0.5 and a standard deviation of 
0.289 (variance = 1/12), so _all_ observations taken from this 
distribution are well within 2 sigmas of the mean.

In fact, any distribution with negative kurtosis will be a better case 
than a Gaussian distribution in this respect, provided that it is 
symmetric (zero skew) and reasonably well behaved.

On numeric grounds, we should expect the distribution of the 
underlying errors to have a negative kurtosis. This is because of the 
finite precision of the result - the data points themselves are 
uncertain to an extent, e.g. if the data is accurate to only 4 bits 
precision then a "true" value of 0.1 could be recorded as either 
0.0625 or 0.1250. This effect tends to reduce the kurtosis (the 
central "hump" of the distribution is flattened and broadened - this 
causes an overestimation of the population standard deviation from 
a sample of observations, however large).


Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 09:49:39 -0300 (EST)
From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?

On Mon, 10 May 1999, Brian J Beesley wrote:

> > > though.  So outside about 14 sigmas you should be able to say the
> > > probability is below 10e-40. The problem is that if there are small
> > > deviations from "Gaussian-ness" way out on the wings of your distribution,
> > > the REAL probability of a certain result is not well approximated by the
> > > Error Function result.

...

> In fact, any distribution with negative kurtosis will be a better case 
> than a Gaussian distribution in this respect, provided that it is 
> symmetric (zero skew) and reasonably well behaved.
> 
> On numeric grounds, we should expect the distribution of the 
> underlying errors to have a negative kurtosis. This is because of the 
> finite precision of the result - the data points themselves are 
> uncertain to an extent, e.g. if the data is accurate to only 4 bits 
> precision then a "true" value of 0.1 could be recorded as either 
> 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the 
> central "hump" of the distribution is flattened and broadened - this 
> causes an overestimation of the population standard deviation from 
> a sample of observations, however large).

I am sorry, but what is kurtosis? Some measure of the speed with which
the tail of the distribution falls, maybe? Or some sort of curvature?

My own impression is that since the errors come from adding many many
small errors we should expect something close to a gaussian distribution,
except that very large errors should be not just very improbable
but actually impossible. Large but not so very large errors should
also have true probabilities smaller than whatever is predicted by
the gaussian distribution.

http://www.mat.puc-rio.br/~nicolau

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 14:28:52 -0000
From: [EMAIL PROTECTED]
Subject: Re: Mersenne: Any statistics majors out there?

>I am sorry, but what is kurtosis? Some measure of the speed with which
>the tail of the distribution falls, maybe? Or some sort of curvature?

Kurtosis is the excess of the fourth moment with respect to the normal
distribution.

A distribution with positive kurtosis has longer "tails" than a normal
distribution with the same standard deviation would be.

A good example of a smooth continuous distribution with negative
kurtosis would be defined by the probability distribution function
p = a * e^(-(x-u)^4/b) (for some constants a,b)
>
>My own impression is that since the errors come from adding many many
>small errors we should expect something close to a gaussian distribution,
>except that very large errors should be not just very improbable
>but actually impossible. Large but not so very large errors should
>also have true probabilities smaller than whatever is predicted by
>the gaussian distribution.

The source of the errors is the difference between a floating-point
value and the nearest integer. The error measure is therefore not
the result of summing many small errors.
Also, we are operating in the region where the nearest integer sometimes
has nearly as many significant bits as the precision of the floating-
point number being compared.
What we are trying to find is how far we can push this before we run
into a real danger that we will round off to the wrong integer. The
point is that, the less "guard bits" we can get away with, the larger
the exponents we can use with a particular transform size - and there
is a substantial performance loss for each increase in transform size
in terms of CPU time per iteration.

Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 13:16:46 -0500 (CDT)
From: Conrad Curry <[EMAIL PROTECTED]>
Subject: Mersenne: M617 project

  On May 4 we have found that the c142 of C(2,783+) = prp62 * prp80,
where

        prp62 = 4436770067411056146721385097277029376746249130734\
                5678065875601

        prp80 = 9279542732670686216319364380604172573667266301775\
                1811620361610403349054728021467

  A total of 8245250 relations were collected by the volunteer
sievers resulting in a matrix of size 840K x 846K.

  This weekend sieving for M617 has started.  Binaries for DOS and
linux are available from ftp://ftp.netdoor.com/users/acurry/nfs
Let me know if you want binaries for other platforms.

  If you want to participate email me privately and I will reserve
a range for you to sieve.

  The volunteer sievers for C(2,783+) are (in alphabetical order)

        Pierre Abbat
        Conrad Curry
        Jim Howell
        Alex Kruppa
        Don Leclair
        Henrik Olsen

________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

Date: Mon, 10 May 1999 19:09:39 -0400
From: "Ernst W. Mayer" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Any statistics majors out there?

George Woltman wrote:

>       I'm working on version 19 of prime95 and I need your help.
>In the past, the exponents at which a larger FFT is used was picked
>rather haphazardly.  I simply picked a few exponents trying to find
>one that could run a thousand or so iterations without the convolution
>error greatly exceeding 0.3.
>
>       For this release, I thought it would be nice to use a more
>scientific approach.  What I've done is run almost 10000 iterations
>of an exponent and used Excel to graph the number of iterations with 
>an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
>you get a Bell-like curve that peaks at some value.

Numerical Recipes (I believe both 1st and 2nd editions) has a description
of the use of the FFT for multiplication of large integers - while their
actual implementation is awful and their error estimates are flawed (see
below), it's a useful starting point. I quote directly from Section 20.6
of the 2nd edition, "Arithmetic at arbitrary precision" (my annotations
in {}):

        "In [Section] 13.1 we learned how to compute the convolution of two
        vectors by the fast Fourier transform (FFT): Each vector is FFT'd,
        the two complex transforms are {dyadically, or pointwise} multiplied,
        and the result is inverse-FFT'd. Since the transforms are done with
        floating arithmetic, we need sufficient precision so that the exact
        integer value of each component {digit} of the result is discernible
        in the presence of roundoff error. We should therefore allow a
        (conservative) few times log_2(log_2 N) bits for roundoff in the FFT.

        A number of length N bytes in radix 256 {NR's preferred base for
        arbitrary-length integer arithmetic} can generate convolution components
        as large as the order of (256)^2 N, thus requiring 16 + log_2 N bits of
        precision for exact storage.

        If [it] is the number of bits in the floating mantissa (cf. [Section]
        20.1), we obtain the condition

                16 + log_2 N + few x log_2 log_2 N < it         (20.6.3)

        We see that single precision, say with [it] = 24, is inadequate for
        any interesting value of N, while double precision, say with [it] = 53,
        allows N to be greater than 10^6, corresponding to some millions of
        decimal digits."

Alas, careful analysis puts a few dents in this tidy framework - let's review.

First: for a random-walk type error behavior, i.e. error roughly proportional
to the square root of the number of correlated floating operations, the FFT
having O(N log_2 N) of these, the accumulated error is O(sqrt(N log_2 N)),
corresponding to O(log_2 sqrt(N log_2 N)) = O(log_2 N + log_2 log_2 N)
contaminated bits, the square root having been absorbed as a factor of 1/2
into the implied constant of proportionality. Thus, the NR estimate of
log_2(log_2 N) bits for roundoff in the FFT is asymptotically too small,
which perhaps explains why they require "a few times" log_2(log_2 N)
for their estimate to work, even though the constant of proportionality
predicted by the FFT opcount is O(1).

Second: NR use a strictly nonnegative-digit representation for their vectors,
i.e. digits in [0, radix-1], and an FFT of such indeed can generate very large
convolution components - e.g. the first result element, which is simply the sum
of all N input digits, is guaranteed to be O(radix*N) large for a random string
of such digits. The average digit size, on the other hand, follows a random-walk
behavior, i.e. is O(radix*sqrt(N)). If we divide by the sqrt(N) factor included
in the definition of the DFT (or we can divide by N after the inverse FFT - it
doesn't matter which), our average digit size is now still O(radix). The dyadic
(pointwise) product then bumps the average size of the terms up to O(radix^2).
The inverse FFT takes these and produces average digit size O(radix^2*sqrt(N)),
but another divide by sqrt(N) brings that down to O(radix^2) again. Upon taking
the logarithm, this translates to O(2*log_2 radix) error bits.

Note: when we instead adopt a balanced-digit representation (as advocated by
Crandall et al.), i.e. digits in [-radix/2, +radix/2], then due to random
cancellations the convolution components, while still following a random-walk
behavior, are generally smaller than those obtained in nonnegative-digit form.
It is not clear whether the superior accuracy of balanced-digit form is due
more to this average effect or to the fact that there are no longer any selected
terms (e.g. the purely additive first FFT element) which are guaranteed to
be large, as occurs using nonnegative-digit form.}

Now, using the refined error estimate described above, allowing for a general
radix and for a fudge factor C to multiply the FFT error bits (this lumps
together the FFT opcount constant of proportionality, the fact that we're
doing not one but two FFTs and the fact that certain operations, e.g. an
occasional subtract of nearly equal numbers or add of disparately sized
numbers, incur a much larger than average roundoff), we get

        2*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it. (20.6.E)

Now let's extract some numbers from
the above. First we need to calibrate the constant C - using IEEE double
precision, [it] = 53, and N = 2^12, I can get around 22 bits per float
before roundoff errors hose my calculation. Thus, I get

        44 + C*[12 + 3.58...] ~= 53, or C ~= 0.577.

This predicts that for FFT length 256K (N = 2^18), I should be able to use
up to radix 2^20, i.e. test exponents up to 20*2^18 = 5242880, which is
quite close - I can get up to around 5200000 in practice.

(The NR equation on the other hand, when written as

        2*log_2(radix) + log_2 N + C*log_2 log_2 N < it

gives for [it] = 53, N = 2^12 and radix = 2^22 that C is a negative number,
which is clearly unreasonable.)

Since these are asymptotic estimates and we are interested in runlengths
approaching 2^20, we should actually use the large-exponent data to calibrate
the inequality. For my code, I set log_2(radix) = 5200000/2^18 = 19.836...,
and since N=2^18, this gives C = 0.6011. (give or take :)

Now let's look at George's data, which will be slightly different due
to Intel 80-bit register type (which is in effect only while data are
in registers). http://www.mersenne.org/status.htm says that Prime95
can test up to p = 5260000 using N = 2^18. This gives C = 0.5805. Now
plugging N=2^20 in, the formula (20.6.E) gives log_2(radix) = 19.4407...,
and predicts that for vector length 1024K, Prime95 should be able to
test up to p ~= 20385000. The above page claims 20500000, so that is
quite close.

The actual distribution of fractional errors in the rounding phase
(George's "bell curve") is also a very interesting topic, but that's
another chapter. The above, calibrated using observed behavior,
will at least allow automation of the setting of FFT length breakpoints.

- -Ernst
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

------------------------------

End of Mersenne Digest V1 #554
******************************

Reply via email to