I might have found a way of speeding up NTT that are usually slower than FFT.
Firts, as some people may not know Number Theoritic Transfoms ( NTT ) , I am going to remember few ideas about them. NTT are also called FFT modulo a prime. Instead of using complex numbers, NTT compute wxith integer modulo a prime p. Thus, all operations are done modulo p ( that is to say, in the ring of integers modulo p) . There are restrictions for the choice of p : - the length l of then NTT must divide (p-1) ; - the radix b must be lower than p ; - p must be prime and (p-1)highly composite. With 32 bit integers , a common choice is : p=13*2^28+1 ; b=2^28. So, the number M represented by an array t[0..(l-1)] of l elements is : M=t[0]+t[1]*b+...+t[l-1]*b^(l-1)=t[0]+t[1]*2^28+...+t[l-1]*(2^28)^(l-1) . Usually, we do several NTT modulo different primes and then use the Chinese Remainder Theorem to get t[0] ;...; t[i] ;...; t[l-1] modulo p, because the terms in the central loop of the transform can exceed the largest integer that fits in a word machine (for instance, 2^32-1) . That's one of the reasons why NTT are slower than FTT. Why don't we compute these terms modulo p at each step of the central loop ? Consequently, it would be useless to do several NTT and the Chinese Remainder Theorem would not be required. Even if I am wrong, the speed of NTT can be improved by using 64 bit integers ( shorter NTT length ) supported by both new processors and new C compilers, and by using Intel MMX instructions. _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers