Dienst writes (in response to Ernst Mayer)

> > If one instead does a fast polynomial-evaluation stage 2, even one
> > with modest circular convolution length (to keep memory requirements
> > reasonable), the stage 2 depth for a given amount of work should rise
> > dramatically, perhaps 2-3 orders of magnitude. One needs to do an
> > extended GCD at the start of stage 2 to generate the modular inverse
> > of the stage 1 residue, but this costs no more than an additional
> > regular GCD - in fact, one can do the regular and EGCD at the end
> > of stage 1 together.
> 
        > Errm, 2-3 orders of magnitude seems far too high an
> improvement. You are not going to want to do too much work in stage 2
> and with short convolutions (at 2 MB per number the convolutions will
> be mighty short) the gains will not be near that great.

         Using a two-dimensional FFT, the time to multiply two bivariate 
polynomials (degree nx-1 in X, degree ny-1 in Y) modulo X^nx - 1 and Y^ny - 1 
is O(nx*ny*log(nx*ny)) = O(nx*ny*(log(nx) + log(ny))).

       Suppose the Mersenne multiplications are length 262144.
We want to multiply two polynomials of degree 31 with
Mersenne coefficients, reducing the product modulo X^32 - 1
(32 is the `short' convolution length).
The time is about 262144*32*(18 + 5).  
The time for a single Mersenne multiplication is about 262144*18.
The time for a convolution of length 32 is about 
32*(18 + 5)/18 ~= 41 times that for a single multiplication.
The classical method for polynomial multiplication would
have a ratio of 1024, and Karatsuba a ratio of 243.

       Yes, the convolution size is limited by our memory space.
But memories are growing, and the code may not be ready for general
use until 2000.

Reply via email to