George Woltman wrote:

>I wrote a program to analyze P-1 factoring success rate.

Yes, that's what I hoped to see. And your stage 2 work estimates
are in fact much too pessimistic - more on this shortly.

>I assumed that GCDs are free (they certainly aren't)

Even if an individual GCD costs much more than a modular squaring,
one can neglect the cost since one needs to do very few of them.
For a precomputed stage 1 primes product, one must wait until stage 1
is complete to take a GCD. In stage 2, one only needs to check for
a factor every so often, say every few hours.

I'm not presuming the fast Lehmer GCD I wrote for Alpha/MIPS will
port directly to x86, since it needs 8-byte ints and a 128-bit multiply.
On the other hand, if the native architecture has at least a 64-bit
integer multiply (low 64 bits of product of two 64-bit integers), one
can generate the high part via a floating multiply and some simple
error-correction operations. On x86 with its 64-bit floating mantissa,
one could generate a full 128-bit product (or perhaps 127, since one
needs one redundant bit in the low and high parts of the product to
effect the error-correction step) this way, and since the FMULs have
latency and are highly pipelineable, one should be able to obtain good
performance - it's worth checking whether the GMP GCD for x86
does things in this fashion.

And if we ever needed a truly fast GCD, one can in fact program one
that takes just O(n log n) operations, as opposed to the O(n^2) of
the usual GCD algorithms (binary, Lehmer or Weber.) It wouldn't be
easy to program, but it can be done.

>[I assumed] that factors longer than 87-bits won't change the
>calculations much.

They won't make much impact in the statistical sense, but don't
underestimate the "fun factor" - it's much more fun to do factoring
if one can speed the search for primes and occasionally find a nice
fat factor (in the 30-40 digit range), to boot. Even if p-1 were
"revenue neutral" in the sense that it gave roughly the same cost/
benefit ratio as straightforward sieving, I would prefer it simply
for the extra variety and the chance of finding a big factor.

>Analysis of p=10000019, how_far_factored=64, B1=20000, B2=400000
>Pass 1 squarings: 28820 or 0.2882%
>Pass 2 squarings: 31598 or 0.315979%

Your stage 2 work estimate is far too pessimistic, since it reflects
the cost of doing a simple "slow" stage 2, where processing a typical
prime takes about 5% the work the same prime would take in stage 1.

If one instead does a fast polynomial-evaluation stage 2, even one
with modest circular convolution length (to keep memory requirements
reasonable), the stage 2 depth for a given amount of work should rise
dramatically, perhaps 2-3 orders of magnitude. One needs to do an
extended GCD at the start of stage 2 to generate the modular inverse
of the stage 1 residue, but this costs no more than an additional
regular GCD - in fact, one can do the regular and EGCD at the end
of stage 1 together.

Of course this all requires significantly more programming effort
than simply extending the capabilities of a simple sieving code,
but I'm convinced that in the long run it will be well worth it.

-Ernst

Reply via email to