I might have found a way of speeding up NTT that are usually slower than FFT.

Firts, as some people may not know Number Theoritic Transfoms ( NTT ) , I 
am going to remember few ideas about them. NTT are also called FFT modulo a 
prime. Instead of using complex numbers, NTT compute wxith integer modulo a 
prime p. Thus, all operations are done modulo p ( that is to say, in the 
ring of integers modulo p) .

There are restrictions for the choice of p :
- the length l of then NTT must divide (p-1) ;
- the radix b must be lower than p ;
- p must be prime and (p-1)highly composite.
With 32 bit integers , a common choice is : p=13*2^28+1 ; b=2^28.
So, the number M represented by an array t[0..(l-1)] of l elements is :
M=t[0]+t[1]*b+...+t[l-1]*b^(l-1)=t[0]+t[1]*2^28+...+t[l-1]*(2^28)^(l-1) .

Usually, we do several NTT modulo different primes and then use the Chinese 
Remainder Theorem to get t[0] ;...; t[i] ;...; t[l-1] modulo p, because the 
terms in the central loop of the transform can exceed the largest integer 
that fits in a word machine (for instance, 2^32-1) . That's one of the 
reasons why NTT are slower than FTT.

Why don't we compute these terms modulo p at each step of the central loop ?
Consequently, it would be useless to do several NTT and the Chinese 
Remainder Theorem would not be required.

Even if I am wrong, the speed of NTT can be improved by using 64 bit 
integers ( shorter NTT length ) supported by both new processors and new C 
compilers, and by using Intel MMX instructions.

Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to