> If one instead does a fast polynomial-evaluation stage 2, even one
> with modest circular convolution length (to keep memory requirements
> reasonable), the stage 2 depth for a given amount of work should rise
> dramatically, perhaps 2-3 orders of magnitude. One needs to do an
> extended GCD at the start of stage 2 to generate the modular inverse
> of the stage 1 residue, but this costs no more than an additional
> regular GCD - in fact, one can do the regular and EGCD at the end
> of stage 1 together.

        Errm, 2-3 orders of magnitude seems far too high an
improvement. You are not going to want to do too much work in stage 2
and with short convolutions (at 2 MB per number the convolutions will
be mighty short) the gains will not be near that great.

> 
> Of course this all requires significantly more programming effort
> than simply extending the capabilities of a simple sieving code,
> but I'm convinced that in the long run it will be well worth it.
> 
        A while ago I had managed to convince myself that 3000000 was
the break even point for p-1 (this was with a stage 2 using 16 (IIRC)
registers and 6'th degree polynomials).  I had for a while attmpted to
argue that p-1 was effective at smaller ranges if one considered the
cost of double cheecking.
        Deinst.

Reply via email to