Mersenne Digest Monday, May 10 1999 Volume 01 : Number 554 ---------------------------------------------------------------------- Date: Sat, 08 May 1999 12:00:55 -0400 From: Chris Nash <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? > though. So outside about 14 sigmas you should be able to say the > probability is below 10e-40. The problem is that if there are small > deviations from "Gaussian-ness" way out on the wings of your distribution, > the REAL probability of a certain result is not well approximated by the > Error Function result. This is a real good point - if we are assuming a Gaussian distribution, then we are assuming the best case. The worst case is given by Tchebycheff's theorem, which states that, given a probability distribution where only the mean and standard deviation is known, then the probability that an observation will fall more than x standard deviations from the mean is bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but with a very unlikely distribution). In other words, if you have no guarantees about the distribution, "counting sigmas" is going to give you a false sense of security, and if the distribution is even slightly deviant from Gaussian, then the result can be very wrong indeed. However we do have a little comfort. George's function - maximum convolution error from a single iteration - does have some distribution. It "appears" bell-shaped, and looks like a normal distribution, the samples George has made are also reasonably-sized. Not only that, each observation is actually the maximum of the deviations from integer over many channels in the FFT - in other words, the observations are already coming from a "smoothed" distribution. I'm sure the distribution of a maximum of several observations is something which there is a statistical test for. (The resulting distribution may not be normal, but may have an analogous test). Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that I got the hard way :) Chris ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sat, 08 May 1999 18:17:48 -0500 From: Ken Kriesel <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? Aren't gaussians symmetric about the mean value? What George plotted is not. At 12:00 PM 1999/05/08 -0400, you wrote: > >> though. So outside about 14 sigmas you should be able to say the >> probability is below 10e-40. The problem is that if there are small >> deviations from "Gaussian-ness" way out on the wings of your distribution, >> the REAL probability of a certain result is not well approximated by the >> Error Function result. > >This is a real good point - if we are assuming a Gaussian distribution, then >we are assuming the best case. The worst case is given by Tchebycheff's >theorem, which states that, given a probability distribution where only the >mean and standard deviation is known, then the probability that an >observation will fall more than x standard deviations from the mean is >bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but >with a very unlikely distribution). In other words, if you have no >guarantees about the distribution, "counting sigmas" is going to give you a >false sense of security, and if the distribution is even slightly deviant >from Gaussian, then the result can be very wrong indeed. > >However we do have a little comfort. George's function - maximum convolution >error from a single iteration - does have some distribution. It "appears" >bell-shaped, and looks like a normal distribution, the samples George has >made are also reasonably-sized. Not only that, each observation is actually >the maximum of the deviations from integer over many channels in the FFT - >in other words, the observations are already coming from a "smoothed" >distribution. I'm sure the distribution of a maximum of several observations >is something which there is a statistical test for. (The resulting >distribution may not be normal, but may have an analogous test). > >Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that >I got the hard way :) > >Chris > > >________________________________________________________________ >Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sat, 08 May 1999 20:55:17 -0400 From: Jud McCranie <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? At 12:00 PM 5/8/99 -0400, Chris Nash wrote: >This is a real good point - if we are assuming a Gaussian distribution, then >we are assuming the best case. The worst case is given by Tchebycheff's >theorem, which states that, given a probability distribution where only the >mean and standard deviation is known, then the probability that an >observation will fall more than x standard deviations from the mean is >bounded above *only* by 1/x^2. Yes, but that is a really bad worst case. Pathological even. Looking at the graphs of the two data sets, you can tell that it is reasonably well behaved. +-------------------------------------------+ | Jud McCranie [EMAIL PROTECTED] | | | | I have macular stars in my eyes. | +-------------------------------------------+ ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sat, 08 May 1999 21:01:06 -0400 From: Jud McCranie <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? At 06:17 PM 5/8/99 -0500, Ken Kriesel wrote: >Aren't gaussians symmetric about the mean value? What George plotted is not. Yes, but isn't too far off. But it does drop off quite a bit faster onthe left. That is why I think it may be better to chop the data at the mean, throw away that under the mean, reflect the other data with respect to the mean, and calculate the standard deviation from that "massaged" data. Then a Normal distribution should approximate it pretty well. +-------------------------------------------+ | Jud McCranie [EMAIL PROTECTED] | | | | I have macular stars in my eyes. | +-------------------------------------------+ ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sun, 9 May 1999 08:45:11 -0000 From: [EMAIL PROTECTED] Subject: Re: Mersenne: Any statistics majors out there? Hi, I have an honours degree in mathematics with a fair amount of statistics content; unfortunately I haven't used the statistics much since I graduated in 1975, so feel free to shoot down what I have to say if you think I'm wrong. Firstly, what we are measuring is the *maximum* value from a sequence of observations; therefore we would not expect the observed distribution to be normal, even if the underlying distribution were. I would expect the distribution to have a positive skew (i.e. to be "squeezed" on the side below the mean, and "stretched" on the side above the mean). As an example, I'm going to stick to numbers small enough to check easily. The probabilities come from a standard set of tables of the normal distribution, to four significant figures. For _any_ distribution, if the probability that a random variable is less than x is p, then the probability that all of a set of N random variables from the same distribution are all less than x is p^N, i.e. the probability that at least one of the N random variables will be greater than or equal to x is 1 - p^N Suppose that the random variables come from a normal distribution with mean 0 and standard deviation 1. However we take the absolute value of the variables - the distribution is symmetric about zero, so this means that we are dealing with twice the integral from 0 to +infinity instead of the integral from -infinity to +infinity. We take the maximum of 100 random variables drawn from this "half- normal" distribution and wish to look at the distribution of the new variable represented by this maximum function. The following tabulated values should illustrate the shape of the distribution. The left hand column is the value of x, the second column is the value of the cumulative distribution function of the normal distribution (from standard stats tables), the third value is the corrected CDF for the upper half of the distribution, and the fourth value is the CDF of the distribution of the maxima of 100 observations. 1.0 0.8413 0.6826 0.000000 1.1 0.8643 0.7286 0.000000 1.2 0.8849 0.7698 0.000000 1.3 0.9032 0.8064 0.000000 1.4 0.9192 0.8384 0.000000 1.5 0.9332 0.8664 0.000001 1.6 0.9452 0.8904 0.000009 1.7 0.9554 0.9108 0.000088 1.8 0.9641 0.9282 0.000581 1.9 0.9713 0.9426 0.002709 2.0 0.9772 0.9544 0.009398 2.1 0.9821 0.9642 0.026104 2.2 0.9861 0.9722 0.059643 2.3 0.9893 0.9786 0.114953 2.4 0.9918 0.9836 0.191360 2.5 0.9938 0.9876 0.287150 2.6 0.9953 0.9906 0.388895 2.7 0.9965 0.9930 0.495364 2.8 0.9974 0.9948 0.593715 2.9 0.9981 0.9962 0.683367 3.0 0.9986 0.9972 0.755487 3.1 0.9990 0.9980 0.818567 3.2 0.9993 0.9986 0.869273 3.3 0.9995 0.9990 0.904792 3.4 0.9997 0.9994 0.941748 3.5 0.9998 0.9996 0.960782 .. at which point the four digit precision of the tables becomes insufficient to make it worthwhile tabulating more data. But I think it's clear that the CDF rises more slowly on the side above the median (just above 2.7) than it does on the lower side. (Sorry I chose 100 now - there is an unfortunate coincidence between "just above 2.7" and e, which is accidental - in fact the median is more accurately 2.705) What does all this mean? Unfortunately I can't see much, except that given a value which is the maximum of a set of 100 data values drawn from a population of positive real values, we can (swallowing hard and trusting in the central limit theorem) deduce that the standard deviation of the underlying distribution is about 1/2.7 of the given value. Perhaps this is therefore the wrong angle to attack the problem from. .................................................................... Could I suggest that the maximum convolution error is likely to occur when whole blocks of bits are 1, thus forcing large integers into the transform so that rounding/truncation errors are most likely to occur. Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1) If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore 2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive 1 bits. So running a couple of iterations of the LL test algorithm starting at 1 (instead of 4), looking at the max convolution error and checking that the value of the residual after 2 iterations is 1, might be instructive ... ? Regards Brian Beesley ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sun, 09 May 1999 12:40:19 +0200 From: Guillermo Ballester Valor <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? Hi to all: George Woltman wrote: > > P.S. A brief explanation of convolution errors: Since the FFT in prime95 > is done in floating point arithmetic, round-off errors build up. After the > FFT, every value is rounded back to an integer. The convolution error is > the largest amount that a result value differs from the correct integer > value. If the convolution error exceeds 0.5, then the round-back-to-integer > step will round to the wrong integer value - a catastrophic failure. > > Another way to look at this: Each FFT value contains about 20 bits of data. > That is, values ranging from 2^19 to -2^19. Consider an FFT of 128K elements. > If each FFT value was close to the 2^19 value, then each result value would > need 19+19+17=55 bits to hold the result. Since a float only holds 53 bits > there would be a loss of information. Fortunately, probability assures us > that it is extremely unlikely that every (or even most) FFT values will > be close to the 2^19 maximum value. > I'm not sure I've understood that correctly. FFT's is a good method to evaluate convolution products in O(n log n) time instead of O (n*n). Well, let M(p) prime, thus if the first term in L-L test L(0)=4, we will get L(p-2)=0 mod M(p). And we also get L(p-3)*L(p-3)=2 mod M(p) which have two solutions. One is 2^((p+1)/2), and in its binary expansion we only have one '1' and all remaining bits are 0. The last convolution would be very easy. *BUT* what if L(p-3) is -2^((p+1)/2) ?. All bits in L(p-3) are '1' except only one '0'. If we perform a convolution for L(p-3)*L(p-3) we can have problems if we have no bits enough. MURPHY THEOREM: We will fault in last iteration and we will let go a big prime... and 50000$. :-( Likely, internal representation in each iteration are no the bits of binary expansion in float form, (I haven't studied in detail the FFT George uses), so don't worry. ;-) - ---------------------------------- | Guillermo Ballester Valor | | [EMAIL PROTECTED] | | c/ cordoba, 19 | | 18151-Ogijares (Spain) | - ---------------------------------- ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sun, 9 May 1999 11:45:51 -0300 (EST) From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? On Fri, 7 May 1999, George Woltman wrote: > Hi all, > > I'm working on version 19 of prime95 and I need your help. > In the past, the exponents at which a larger FFT is used was picked > rather haphazardly. I simply picked a few exponents trying to find > one that could run a thousand or so iterations without the convolution > error greatly exceeding 0.3. > > For this release, I thought it would be nice to use a more > scientific approach. What I've done is run almost 10000 iterations > of an exponent and used Excel to graph the number of iterations with > an error between 0.200 and 0.201, 0.201 and 0.202, etc. As you can imagine > you get a Bell-like curve that peaks at some value. > ... > > P.S. A brief explanation of convolution errors: Since the FFT in prime95 > is done in floating point arithmetic, round-off errors build up. After the > FFT, every value is rounded back to an integer. The convolution error is > the largest amount that a result value differs from the correct integer > value. If the convolution error exceeds 0.5, then the round-back-to-integer > step will round to the wrong integer value - a catastrophic failure. > > Another way to look at this: Each FFT value contains about 20 bits of data. > That is, values ranging from 2^19 to -2^19. Consider an FFT of 128K elements. > If each FFT value was close to the 2^19 value, then each result value would > need 19+19+17=55 bits to hold the result. Since a float only holds 53 bits > there would be a loss of information. Fortunately, probability assures us > that it is extremely unlikely that every (or even most) FFT values will > be close to the 2^19 maximum value. Maybe my approach is completely misguided, but I would prefer to look at the algorithm/source code and try to prove something. What are the relevant parts of the source code? Is this part written in some language which is more or less easy for a human to understand (such as C instead of assembly)? Also, I would like to make sure I understand the arithmetic in the last paragraph. You say the FFT values contain 20 bits of data each and that they range from 2^19 to -2^19: I assume this means that they are integers with 19 digits base 2, plus one bit for the sign. But what are these integers? Surely not an instance of the Lucas-Lehmer sequence mod p, or they would have as many bits as p, which is around 23 for the values of p being currently tested. I also assume that 17 comes from 128K = 2^17, but why do we need 19+19+17 bits to hold the result? By the way, this is a somewhat dumb question, but is even an error of .499 considered ok (when rounding to the nearest integer, I mean)? Probably not... what is the maximum tolerated error? Nicolau http://www.mat.puc-rio.br/~nicolau ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sun, 9 May 1999 12:38:40 -0300 (EST) From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? On Sun, 9 May 1999, I wrote: > Also, I would like to make sure I understand the arithmetic in the last > paragraph. You say the FFT values contain 20 bits of data each and that > they range from 2^19 to -2^19: I assume this means that they are > integers with 19 digits base 2, plus one bit for the sign. > But what are these integers? Surely not an instance of the Lucas-Lehmer > sequence mod p, or they would have as many bits as p, which is around > 23 for the values of p being currently tested. That was a stupid possibility, of course. It is the *exponents* which have approx 23 bits and the LL sequence must be computed modulo the *Mersenne number*, which has millions of bits. But I still do not understand what exactly has 20 bits. Perhaps a term of the LL sequence is written in base 2^20, so that each "digit" is a block of 20 bits? Nicolau http://www.mat.puc-rio.br/~nicolau ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Sun, 09 May 1999 19:03:12 EDT From: "Foghorn Leghorn" <[EMAIL PROTECTED]> Subject: Mersenne: P587 factored? I noticed that P587 is no longer on Geroge's ECM factoring page, but there is no factor listed. Did someone else find a factor? _______________________________________________________________ Get Free Email and Do More On The Web. Visit http://www.msn.com ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 03:47:10 +0200 (MET DST) From: [EMAIL PROTECTED] Subject: Re: Mersenne: P587 factored? Observant "Foghorn Leghorn" <[EMAIL PROTECTED]> asks: > I noticed that P587 is no longer on Geroge's ECM factoring page, but there > is no factor listed. Did someone else find a factor? Robert Silverman sieved it by SNFS. CWI did the linear algebra. C(2,587+) 2769467 13119770765051463547 * C151 = prp71 * prp80. SNFS Silverman/CWI 58304599029582814346174509784805811323612591021863318902938306546239131 79700481573792089991933884278502015980406487362166583968208410611454698961420697 Conrad Curry is recruiting volunteers for M617 by SNFS. CWI is doing 2,1186L and 2,1198L. [2,1198L denotes 2^599 - 2^300 + 1, a divisor of 2^1198 + 1.] These will complete 2,n+- for n <= 600 and 2,nLM for n <= 1200. Under ten composite b^n +- 1 with b <= 12 and b^n <= 10^180 remain in the Cunningham table. ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 10:37:34 GMT From: "Brian J Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? Yesterday I wrote > > Could I suggest that the maximum convolution error is likely to occur > when whole blocks of bits are 1, thus forcing large integers into the > transform so that rounding/truncation errors are most likely to occur. > > Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1) > > If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values Aaarrghhh! I think I must have consumed too much Old Bushmills brain solvent ... x = 1, x' = x'' = x''' = -1 ... 8*) > of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore > 2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive > 1 bits. Fortunately my error does not affect this argument. > > So running a couple of iterations of the LL test algorithm starting > at 1 (instead of 4), looking at the max convolution error and checking > that the value of the residual after 2 iterations is 1, might be > instructive ... ? Well, I tried it using a modified version of Richard Crandall's generic C program lucdwt. Starting with 1 instead of 4 gives the correct residual 0xFFFFFFFFFFFFFFFE (-1 mod (2^p-1)) for all iterations and with _zero_ error, provided the transform size > p/32. So, almost all zero bits and almost all one bits are both very easy on the transform. Is it possible to generate "worst case" data, and, if so, would anyone care to suggest what it might be? e.g. I remember we used to fill space with 0xE5 on FM disks, or 0x6DB6 on MFM disks, because these bit patterns were particularly difficult for the hardware to cope with - thus, if the test pattern was reliable, the data should be safe. Bearing in mind the strong links between signal processing and Fourier transforms, this illustration may possibly be not entirely without relevance. My feeling is that running the LL test itself, starting at 4, is probably as effective a way as any of pseudo-randomly generating bit patterns for the purpose of testing the FFT logic, unless someone can demonstrate a better method. (Of course it takes a few iterations for the mod 2^p-1 to "kick in" and start to "randomize" the bit pattern - but after less than 50 iterations the initial conditions have been well and truly obfuscated) Regards Brian Beesley ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 11:06:27 GMT From: "Brian J Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? > > though. So outside about 14 sigmas you should be able to say the > > probability is below 10e-40. The problem is that if there are small > > deviations from "Gaussian-ness" way out on the wings of your distribution, > > the REAL probability of a certain result is not well approximated by the > > Error Function result. > > This is a real good point - if we are assuming a Gaussian distribution, then > we are assuming the best case. The worst case is given by Tchebycheff's > theorem, which states that, given a probability distribution where only the > mean and standard deviation is known, then the probability that an > observation will fall more than x standard deviations from the mean is > bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but > with a very unlikely distribution). In other words, if you have no > guarantees about the distribution, "counting sigmas" is going to give you a > false sense of security, and if the distribution is even slightly deviant > from Gaussian, then the result can be very wrong indeed. This is all fine - except the assertion about a Gaussian distribution being the best case. Take a rectangular distribution - a continuous variable is equally likely to have any value between 0 and 1, but never less than 0 nor greater than 1. This distribution has a mean of 0.5 and a standard deviation of 0.289 (variance = 1/12), so _all_ observations taken from this distribution are well within 2 sigmas of the mean. In fact, any distribution with negative kurtosis will be a better case than a Gaussian distribution in this respect, provided that it is symmetric (zero skew) and reasonably well behaved. On numeric grounds, we should expect the distribution of the underlying errors to have a negative kurtosis. This is because of the finite precision of the result - the data points themselves are uncertain to an extent, e.g. if the data is accurate to only 4 bits precision then a "true" value of 0.1 could be recorded as either 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the central "hump" of the distribution is flattened and broadened - this causes an overestimation of the population standard deviation from a sample of observations, however large). Regards Brian Beesley ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 09:49:39 -0300 (EST) From: "Nicolau C. Saldanha" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Any statistics majors out there? On Mon, 10 May 1999, Brian J Beesley wrote: > > > though. So outside about 14 sigmas you should be able to say the > > > probability is below 10e-40. The problem is that if there are small > > > deviations from "Gaussian-ness" way out on the wings of your distribution, > > > the REAL probability of a certain result is not well approximated by the > > > Error Function result. ... > In fact, any distribution with negative kurtosis will be a better case > than a Gaussian distribution in this respect, provided that it is > symmetric (zero skew) and reasonably well behaved. > > On numeric grounds, we should expect the distribution of the > underlying errors to have a negative kurtosis. This is because of the > finite precision of the result - the data points themselves are > uncertain to an extent, e.g. if the data is accurate to only 4 bits > precision then a "true" value of 0.1 could be recorded as either > 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the > central "hump" of the distribution is flattened and broadened - this > causes an overestimation of the population standard deviation from > a sample of observations, however large). I am sorry, but what is kurtosis? Some measure of the speed with which the tail of the distribution falls, maybe? Or some sort of curvature? My own impression is that since the errors come from adding many many small errors we should expect something close to a gaussian distribution, except that very large errors should be not just very improbable but actually impossible. Large but not so very large errors should also have true probabilities smaller than whatever is predicted by the gaussian distribution. http://www.mat.puc-rio.br/~nicolau ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 14:28:52 -0000 From: [EMAIL PROTECTED] Subject: Re: Mersenne: Any statistics majors out there? >I am sorry, but what is kurtosis? Some measure of the speed with which >the tail of the distribution falls, maybe? Or some sort of curvature? Kurtosis is the excess of the fourth moment with respect to the normal distribution. A distribution with positive kurtosis has longer "tails" than a normal distribution with the same standard deviation would be. A good example of a smooth continuous distribution with negative kurtosis would be defined by the probability distribution function p = a * e^(-(x-u)^4/b) (for some constants a,b) > >My own impression is that since the errors come from adding many many >small errors we should expect something close to a gaussian distribution, >except that very large errors should be not just very improbable >but actually impossible. Large but not so very large errors should >also have true probabilities smaller than whatever is predicted by >the gaussian distribution. The source of the errors is the difference between a floating-point value and the nearest integer. The error measure is therefore not the result of summing many small errors. Also, we are operating in the region where the nearest integer sometimes has nearly as many significant bits as the precision of the floating- point number being compared. What we are trying to find is how far we can push this before we run into a real danger that we will round off to the wrong integer. The point is that, the less "guard bits" we can get away with, the larger the exponents we can use with a particular transform size - and there is a substantial performance loss for each increase in transform size in terms of CPU time per iteration. Regards Brian Beesley ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 13:16:46 -0500 (CDT) From: Conrad Curry <[EMAIL PROTECTED]> Subject: Mersenne: M617 project On May 4 we have found that the c142 of C(2,783+) = prp62 * prp80, where prp62 = 4436770067411056146721385097277029376746249130734\ 5678065875601 prp80 = 9279542732670686216319364380604172573667266301775\ 1811620361610403349054728021467 A total of 8245250 relations were collected by the volunteer sievers resulting in a matrix of size 840K x 846K. This weekend sieving for M617 has started. Binaries for DOS and linux are available from ftp://ftp.netdoor.com/users/acurry/nfs Let me know if you want binaries for other platforms. If you want to participate email me privately and I will reserve a range for you to sieve. The volunteer sievers for C(2,783+) are (in alphabetical order) Pierre Abbat Conrad Curry Jim Howell Alex Kruppa Don Leclair Henrik Olsen ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ Date: Mon, 10 May 1999 19:09:39 -0400 From: "Ernst W. Mayer" <[EMAIL PROTECTED]> Subject: Mersenne: Re: Any statistics majors out there? George Woltman wrote: > I'm working on version 19 of prime95 and I need your help. >In the past, the exponents at which a larger FFT is used was picked >rather haphazardly. I simply picked a few exponents trying to find >one that could run a thousand or so iterations without the convolution >error greatly exceeding 0.3. > > For this release, I thought it would be nice to use a more >scientific approach. What I've done is run almost 10000 iterations >of an exponent and used Excel to graph the number of iterations with >an error between 0.200 and 0.201, 0.201 and 0.202, etc. As you can imagine >you get a Bell-like curve that peaks at some value. Numerical Recipes (I believe both 1st and 2nd editions) has a description of the use of the FFT for multiplication of large integers - while their actual implementation is awful and their error estimates are flawed (see below), it's a useful starting point. I quote directly from Section 20.6 of the 2nd edition, "Arithmetic at arbitrary precision" (my annotations in {}): "In [Section] 13.1 we learned how to compute the convolution of two vectors by the fast Fourier transform (FFT): Each vector is FFT'd, the two complex transforms are {dyadically, or pointwise} multiplied, and the result is inverse-FFT'd. Since the transforms are done with floating arithmetic, we need sufficient precision so that the exact integer value of each component {digit} of the result is discernible in the presence of roundoff error. We should therefore allow a (conservative) few times log_2(log_2 N) bits for roundoff in the FFT. A number of length N bytes in radix 256 {NR's preferred base for arbitrary-length integer arithmetic} can generate convolution components as large as the order of (256)^2 N, thus requiring 16 + log_2 N bits of precision for exact storage. If [it] is the number of bits in the floating mantissa (cf. [Section] 20.1), we obtain the condition 16 + log_2 N + few x log_2 log_2 N < it (20.6.3) We see that single precision, say with [it] = 24, is inadequate for any interesting value of N, while double precision, say with [it] = 53, allows N to be greater than 10^6, corresponding to some millions of decimal digits." Alas, careful analysis puts a few dents in this tidy framework - let's review. First: for a random-walk type error behavior, i.e. error roughly proportional to the square root of the number of correlated floating operations, the FFT having O(N log_2 N) of these, the accumulated error is O(sqrt(N log_2 N)), corresponding to O(log_2 sqrt(N log_2 N)) = O(log_2 N + log_2 log_2 N) contaminated bits, the square root having been absorbed as a factor of 1/2 into the implied constant of proportionality. Thus, the NR estimate of log_2(log_2 N) bits for roundoff in the FFT is asymptotically too small, which perhaps explains why they require "a few times" log_2(log_2 N) for their estimate to work, even though the constant of proportionality predicted by the FFT opcount is O(1). Second: NR use a strictly nonnegative-digit representation for their vectors, i.e. digits in [0, radix-1], and an FFT of such indeed can generate very large convolution components - e.g. the first result element, which is simply the sum of all N input digits, is guaranteed to be O(radix*N) large for a random string of such digits. The average digit size, on the other hand, follows a random-walk behavior, i.e. is O(radix*sqrt(N)). If we divide by the sqrt(N) factor included in the definition of the DFT (or we can divide by N after the inverse FFT - it doesn't matter which), our average digit size is now still O(radix). The dyadic (pointwise) product then bumps the average size of the terms up to O(radix^2). The inverse FFT takes these and produces average digit size O(radix^2*sqrt(N)), but another divide by sqrt(N) brings that down to O(radix^2) again. Upon taking the logarithm, this translates to O(2*log_2 radix) error bits. Note: when we instead adopt a balanced-digit representation (as advocated by Crandall et al.), i.e. digits in [-radix/2, +radix/2], then due to random cancellations the convolution components, while still following a random-walk behavior, are generally smaller than those obtained in nonnegative-digit form. It is not clear whether the superior accuracy of balanced-digit form is due more to this average effect or to the fact that there are no longer any selected terms (e.g. the purely additive first FFT element) which are guaranteed to be large, as occurs using nonnegative-digit form.} Now, using the refined error estimate described above, allowing for a general radix and for a fudge factor C to multiply the FFT error bits (this lumps together the FFT opcount constant of proportionality, the fact that we're doing not one but two FFTs and the fact that certain operations, e.g. an occasional subtract of nearly equal numbers or add of disparately sized numbers, incur a much larger than average roundoff), we get 2*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it. (20.6.E) Now let's extract some numbers from the above. First we need to calibrate the constant C - using IEEE double precision, [it] = 53, and N = 2^12, I can get around 22 bits per float before roundoff errors hose my calculation. Thus, I get 44 + C*[12 + 3.58...] ~= 53, or C ~= 0.577. This predicts that for FFT length 256K (N = 2^18), I should be able to use up to radix 2^20, i.e. test exponents up to 20*2^18 = 5242880, which is quite close - I can get up to around 5200000 in practice. (The NR equation on the other hand, when written as 2*log_2(radix) + log_2 N + C*log_2 log_2 N < it gives for [it] = 53, N = 2^12 and radix = 2^22 that C is a negative number, which is clearly unreasonable.) Since these are asymptotic estimates and we are interested in runlengths approaching 2^20, we should actually use the large-exponent data to calibrate the inequality. For my code, I set log_2(radix) = 5200000/2^18 = 19.836..., and since N=2^18, this gives C = 0.6011. (give or take :) Now let's look at George's data, which will be slightly different due to Intel 80-bit register type (which is in effect only while data are in registers). http://www.mersenne.org/status.htm says that Prime95 can test up to p = 5260000 using N = 2^18. This gives C = 0.5805. Now plugging N=2^20 in, the formula (20.6.E) gives log_2(radix) = 19.4407..., and predicts that for vector length 1024K, Prime95 should be able to test up to p ~= 20385000. The above page claims 20500000, so that is quite close. The actual distribution of fractional errors in the rounding phase (George's "bell curve") is also a very interesting topic, but that's another chapter. The above, calibrated using observed behavior, will at least allow automation of the setting of FFT length breakpoints. - -Ernst ________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm ------------------------------ End of Mersenne Digest V1 #554 ******************************
