Mersenne Digest Wednesday, May 12 1999 Volume 01 : Number 555
----------------------------------------------------------------------
Date: Mon, 10 May 1999 16:23:49 -0700
From: Todd Sauke <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?
George,
Given the uncertainty about the actual probability distribution function
involved it seems that the best way to estimate the probability of having
no roundoff greater than 0.5 in one iteration is to simply plot one minus
your empirical cumulative distribution function (on a log scale) versus
round off error on a scale that goes out to 0.5, and extrapolate by
"eyeball" to see where the curve would hit 0.5 and if it would be below
1e-10 or 1e-15 or whatever. Wherever the curve hits (or appears it would
hit) 0.5, times the number of needed LL iterations (p), gives an empirical
estimate of the probability that there will be no roundoff greater than 0.5
(for small values of probability, of course).
It is likely that some (few?) individual people who are "into" this could
use the raw Excel data numbers, not just the plots you pointed us to (which
I still can't make my computer see.) I would appreciate it if you could
ship me one or two files with probability histograms (or occurrance density
or whatever) versus roundoff value that could be loaded into Excel 4.0 for
Mac.
It might be interesting to see if the shape of this log-lin plot follows
the one given by the cumulative distribution function a-la Brian Beesley.
This function seems to have one degree of freedom, the number of rounded
off values of which the maximum one gives your function, which in his
example he chose to be 100. You probably know from the code what that
number should be. (Another degree of freedom is the underlying uncertainty
or the "scaling factor" for the x axis.) You might also know from the code
what the "underlying" sigma should be under the assumption that it is
something like (sigma0 * SquareRoot(n)) where n is the number of roundings
that occur in the cumulative calculation and sigma0 is the typical
"elementary" round-off error (scaled properly to be in the same units as
the 0.5 critical roundoff limit). If this is not available a-priori from
the code, it would still be interesting to estimate it by finding the best
fit to your empirical cumulative distribution function. If the resulting
smooth cumulative distribution function is a good match to the empirical
one it could serve as a guide for better extrapolation (or instead of the
extrapolation). If it is not a good match it would serve as a "warning"
that there may be "funny" things going on with the errors (particularly
worrysome if the "tail" is trending up, implying larger errors are more
likely than indicated by theory).
Todd Sauke
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 09:15:09 -0400
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Any statistics majors out there?
At 10:13 AM 5/11/99 +0000, you wrote:
>The theoretical distribution is most definitely not an F-distribution,
>though the shape may be reminiscent.
If we knew what the distribution is theoretically, we could fit it. I'm a
former temporary part-time adjunct instructor of statistics for a minor
university (really!), but it has been a long time and I don't have most of my
books at hand.
>Since the pdf of the normal distribution does not have a "pretty"
>analytic integral, the function I[0,x] is complex.
I've got a decent numerical integration routine (that I use for integrating the
Gaussian dist.)
>All assuming that the underlying distribution of the source data is
>normal - or near enough to make no practical differerence!
>
If you're talking about the actual data George gave us - it is far from
normal. At first I was going to do a Kolmogorov-Smirnov test to see if it was
close to a normal, but that is really designed for the raw data. Since the
data was in a histogram I used the Chi-squared test. The data posted was far
from a normal distribution (even just the right half of it was far from a
normal).
+------------------------------------------+
| Jud McCranie [EMAIL PROTECTED] |
+------------------------------------------+
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 09:45:05 -0400
From: "Silverman, Bob" <[EMAIL PROTECTED]>
Subject: Mersenne: RE: Mersenne Digest V1 #554
Date: Mon, 10 May 1999 03:47:10 +0200 (MET DST)
From: [EMAIL PROTECTED]
Subject: Re: Mersenne: P587 factored?
Stuff deleted.......
>Conrad Curry is recruiting volunteers for M617 by SNFS.
>CWI is doing 2,1186L and 2,1198L.
- -----------> After M617, I'd like to see Conrad et.al. do 2,613+. It
has NO known factors (other than 3). I would also like to see people
expend more ECM effort on 2^n+/-1 for composite n. These numbers have
been relatively neglected (relative to 2^p - 1).
>[2,1198L denotes 2^599 - 2^300 + 1, a divisor of 2^1198 + 1.]
>These will complete 2,n+- for n <= 600 and 2,nLM for n <= 1200.
>Under ten composite b^n +- 1 with b <= 12 and b^n <= 10^180
>remain in the Cunningham table.
- --------> I am doing 5,251+ now and will then do 5,257+. Bob
>From: Conrad Curry <[EMAIL PROTECTED]>
>Subject: Mersenne: M617 project
> On May 4 we have found that the c142 of C(2,783+) = prp62 * prp80,
>where
...stuff deleted..
> This weekend sieving for M617 has started. Binaries for DOS and
>linux are available from ftp://ftp.netdoor.com/users/acurry/nfs
>Let me know if you want binaries for other platforms.
- ---------> I will provide source code for anyone who wants it. Bob
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 20:50:00 +0200
From: Preda Mihailescu <[EMAIL PROTECTED]>
Subject: Re: Mersenne: ECM question
>
> > At Paul Zimmerman's ECM page,
> > http://www.loria.fr/~zimmerma/records/ecmnet.html
> > the optimal B1 value listed for finding 50-digit factors is 43000000, but
> > George's ECM factoring page uses 44000000 for the same purpose. Is one of
> > them wrong, or is there a reason for the difference?
>
> If I count the zeros right, these are 43 million and 44 million.
> The function being minimized, namely
>
> probability of finding a 50-digit factor on one curve, given that such exists
> -----------------------------------------------------------------------------
> time per curve
>
> is flat near its minimum. Implementation and platform differences
> can obviously affect the denominator (time per curve).
> The stage-2 strategy affects the numerator. The two optimal B1's
> are close enough to be considered the same.
>
>
I was wondering about the choice of B1 and B2. Is it possible to ellaborate on the
choice
of these parameters function of the size of expected factors - without taking too much
of your
time ?
Thank you,
Sincerely
Preda Mihailescu
> ________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 17:41:35 -0400
From: "Ernst W. Mayer" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Any statistics majors out there? (long)
After going home last night, I realized there is one flaw in my
analysis of yesterday: The convolution-size-based portion of the
roundoff error, i.e. the leftmost term in
> 2*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it (20.6.E)
must also have a constant multiplying it, i.e. the equation should be
A*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it. (20.6.E')
The two constants A and C should be deducible from actual behavior
of the algorithm, by doing a nonlinear least-squares fit of (20.6.E')
to the empirical (radix vs. N) datasets, assuming that (max. radix) for
any N satisfies (20.6.E') with equality replacing the inequality:
A*log_2(max. radix) + C*[log_2 N + log_2 log_2 N] = it. (20.6.E'')
Also note that A and C will depend on the digit representation used
(balanced-digit form vs. nonnegative-digit), the floating-point accuracy
of the hardware (including the rounding mode used), and the details of
the algorithmic implementation. To ease the algebra, let's simplify
the notation: let y = log_2(max. radix), z = log_2 N, so we have the
weakly nonlinear relation
A*y = it - C*[z + log_2 z]. (20.6.E''')
Further letting x = [z + log_2 z] yields a linear relation:
y = a0 + a1*x, where a0 = it/A, a1 = - C/A.
I have decently accurate (based on empirical testing) values of (max. radix)
vs. N for my Mlucas code (rightmost column based on analysis below):
z = log_2 N x = [z + log_2 z] y = log_2(radix) y from LS fit
10 13.322 21.97 22.00
11 14.459 21.73 21.71
12 15.585 21.48 21.43
13 16.700 21.15 21.14
14 17.807 20.90 20.86
15 18.907 20.45 20.58
16 20.000 20.29 20.30
17 21.087 19.99 20.02
18 22.170 19.84 19.74
19 23.248 19.45 19.47
Plotting these yields something very linear-plus-scatter looking, so
our relation (20.6.E'') looks like it indeed fits the observed data well.
For the least-squares fit, we need the following arithmetic averages,
where <f> := sum_{i=1...n}(f_i)/n :
<x> = 18.3285, <y> = 20.725, <x*y> = 377.30, <x^2> = 345.95, <y^2> = 430.18.
The slope (a1) and y-intercept (a0) of our best-fit line are given by
<x*y> - <x>*<y> <y>*<x^2> - <x>*<x*y>
a1 = --------------- = -.2554, a0 = --------------------- = <y>-a1*<x> = 25.406.
<x^2> - <x>^2 <x^2> - <x>^2
These yield constants A = 53/a0 = 2.086, C = -a1*A = 0.5328 in (20.6.E'),
so both assumed constants are positive and order of unity, as expected.
We also need the following quantities (I forget what their names are):
S_r = n*[(<y^2> - <y>^2) -a1*(<x*y> - <x>*<y>)] = .01006585
S_t = n*(<y^2>-<y>^2) = 6.54375
The standard error (sigma) and correlation coefficient (r) are
sigma = sqrt(S_r/(n-2)) = .0354716, r = sqrt((S_t-S_r)/S_t) = .9992306,
and the fact that r is very close to unity indicates a very good correlation.
For George's breakpoint data at http://www.mersenne.org/status.htm,
using that log_2(radix) = (max. p)/N:
N z=log_2N x = [z+log_2 z] pmax y=log_2(radix) y (LS) pmax(LS)
98304 16.585 20.637 2000000 20.345 20.377 2003128
114688 16.807 20.878 2330000 20.316 20.321 2330605
131072 17.000 21.087 2655000 20.256 20.273 2657228
163840 17.322 21.436 3310000 20.203 20.193 3308342
196608 17.585 21.721 3960000 20.142 20.127 3957082
229376 17.807 21.962 4605000 20.076 20.071 4603841
262144 18.000 22.170 5260000 20.065* 20.023 5248952
327680 18.322 22.517 6540000 19.958 19.943 6534954
393216 18.585 22.801 7820000 19.887 19.878 7816179
458752 18.807 23.041 9090000 19.815 19.822 9093472
524288 19.000 23.248 10380000 19.798* 19.774 10367499
655360 19.322 23.594 12895000 19.676 19.695 12907054
786432 19.585 23.877 15400000 19.582 19.629 15437114
917504 19.807 24.115 17945000 19.558 19.574 17959583
1048576 20.000 24.322 20500000 19.550* 19.527 20475156
These give a best-fit line slope and intercept of
a1 = -.23073, a0 = 25.13845, (corresponding to A = , C = )
where the smaller (in magnitude) slope reflects the slightly greater numerical
precision Prime95 enjoys over its non-x86 counterparts. The standard error and
correlation coefficient are
sigma = .0244426, r = .9964081.
Thus, the Prime95 data are slightly less well-correlated (when fitted to the
assumed function) than those for Mlucas. This may reflect a few slightly
outlying points, indicated by asterisks to the right of their y-value in
the above table. Note that all the outliers correspond to power-of-two FFT
lengths. It may be that Prime95 enjoys slightly better accuracy when the
FFT length is a power of two, although it's not clear to me why this should
be the case (I don't see this behavior with my code, but it's not been used
as much, either). Perhaps careful examination of the radix-3, 5 and 7 front
end routines (George's terminology) might turn up some sequences of floating-
point operations that tend to suffer larger roundoff errors than those found in
power-of-2 routines (where the symmetry of the operations makes instruction
scheduling somewhat easier).
FURTHER REFINEMENTS: Richard Crandall writes, regarding my convolution
error estimate:
>For X steps of an error-accumulation walk, the estimate
>is yet worse for the *maximum* deviation. I assume
>the maximum is in fact the entity of interest, because
>of the nature of the convolution problem. The known statistic
>is that for X steps the maxmimum random walk deviation is
>
> Sqrt[X] Log X
>
>with probability one, in the sense of the Fisher theorem.
>Because of the Log X factor, the need for "a few" multiples
>of Sqrt[number of operations] has the "few" growing
>out of control.
>
>I believe a practical factor is say
>
> (1 + B Log[x]) Sqrt[x]
>
>for some empirical constant B. I have not tested this,
>but believe it so.
>
>There might actually be yet another factor of Log Log X,
>I would have to check. At any rate, the maximum
>convolution errors should grow faster than just the Sqrt[.].
This would imply that my A*log_2(radix) term in (20.6.E') should
in fact include a log_2 N (and perhaps also a log_2 log_2 N) term.
I leave it as an exercise to the reader as to whether this leads to
a better correlation with the observed data.
- -Ernst
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 23:06:15 -0000
From: [EMAIL PROTECTED]
Subject: Mersenne: Optimizer for Prime95
Hi everyone,
I have posted a copy of my ReCache program, which allows Prime95 to be
run consistently at its best speed, on my anonymous ftp server, at URL
ftp://lettuce.edsc.ulst.ac.uk/gimps/software/ReCache.zip
Feel free to distribute / post copies on your local mirrors etc. as you
see fit.
Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 23:02:24 -0400 (EDT)
From: lrwiman <[EMAIL PROTECTED]>
Subject: Mersenne: Convincing administrators
All,
I'm going to try to get my school to install Prime95 on
their >100 PII's. Does anyone have experience dealing with
large stupid beurocracies? Any pointers? Who should I try and
talk to first?
Thank you in advance,
Lucas Wiman
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Tue, 11 May 1999 21:36:12 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Convincing administrators
> All,
> I'm going to try to get my school to install Prime95 on
> their >100 PII's. Does anyone have experience dealing with
> large stupid beurocracies? Any pointers? Who should I try and
> talk to first?
First rule when dealing with large stupid beauracracies: If it's a bellco,
forget about it.
Seriously though, I'd check out the web page done by the TempleU-CAS team
(http://etc.temple.edu/gimps/) as one example of a large team using lots of
computers successfully. Another is the excellent "case study" by the
University at Albany (http://hawk.fab2.albany.edu/mersenne/mersenne.htm)
which again explores the use of "idle priority" processes on computers.
I only wish a certain Bellco were nice enough to have let me run it on their
computers, but hey, c'est la vie.
One approach I might suggest would be to talk it over with the "technical"
gurus in charge and get them to agree that it is a safe, low impact, low
risk project. With technical muscle on your side, you can then approach the
administration about it. Be sure and tout it as a "research" project
(schools should like that) and be sure to point out that any award money
collected would go straight back to them, hopefully to invest in further
such projects.
Aaron
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Wed, 12 May 1999 12:09:57 +0200
From: Alex Kruppa <[EMAIL PROTECTED]>
Subject: Re: Mersenne: ECM question
Hi all,
I have a different question concerning P-1 and ECM.
Some time ago I asked which power to put small primes
into when multiplying them into E ( factor = gcd(a^E-1,N) ).
Paul Leyland, I believe, replied that the power for prime p should
be trunc( ln(B1) / ln(p) ) ( log(B1) with base p ),
where B1 is the bound up to which we put primes into E.
But what if there is a stage 2 with a higher bound B2?
Should it be trunc( ln(B2) / ln(p) ) then? Or still the stage 1 bound?
In his Diplomarbeit about ECM
( see ftp://ftp.informatik.tu-darmstadt.de/pub/TI/reports/berger.diplom.ps.gz ),
Franz-Dieter Berger mentiones on page 40f that his experience
shows that it is better to use the stage 2 bound.
Any opinion from the factoring gurus here on the list?
Ciao,
Alex.
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Wed, 12 May 1999 16:14:45 -0400
From: "Pierre Abbat" <[EMAIL PROTECTED]>
Subject: Mersenne: Factoring bignums
Is there a program for factoring numbers up to, say, 2^128 in a reasonable
time? I tried bc but it doesn't have a factor command, so I wrote a loop and it
spent all its time outputting.
phma
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
Date: Wed, 12 May 1999 22:45:07 PDT
From: "david campeau" <[EMAIL PROTECTED]>
Subject: Mersenne: LLL
I don`t know what is going on, but LLL as about 6000 exponnent reserved and
it`is still asking for more! If this continue for the night there should not
be any exponnent available this morning.
David,
P.S. current number of exponents available 5351 (5AM UTC)
______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
------------------------------
End of Mersenne Digest V1 #555
******************************