"Richard Foster" <r-fos...@gmx.com> writes: > Even it looks OK in today's code, does Nisse's new FFT code change > anything?
The small-prime FFT code uses fft-based multiplication of polynomials over Z_p for a couple of single-limb primes p. Usually in the range from 3 to 5 primes, depending on how the input size fits constraints on transform size, and it's probably beneficial to go a bit beyond 5. The FFTs are organized to spend most of the time doing small FFTs fitting the L1 cache and implemented as tight assembly loops, and on top of that somewhat hairy logic doing twiddle factors, data transpositions, truncated fft logic, and the like. Regarding threading, the best strategy will likely differ depending on the number of cores you target. For a small number of cores, it seems reasonable to let one core handle each p, doing all of forward transforms, pointwise multiply, and inverse transforms. They would then work independently, with only an initial pass to distrivute the mod p work to threads, and a final pass to combine mod p results using crt and assemble the final product. I'm not that familiar with threading, but I'd expect a reasonable speedup for large operands, aiming to have each core mainly exercise its local L1 cache. If cores end up competing for available cache or memory bandwidth, gain will be smaller. It's also possible to try to spawn worker threads to do the smallest mod p FFTs, but as Torbjörn says, there's a risk that lots of data has to be passed from one core to another. So I'd first try to split computation work between threads in much larger pieces than a single small FFT at a time. As for performance, the assembly loops when I worked on this (which is close to 10(!) years ago) ran at some 8-10 cycles per butterfly, on the AMD Opteron which was high end at the time. I'm aware of a trick by David Harvey, who got running time down to only 6 cycles per butterfly (totally saturating the multiplication units) on the same hardware, by using primes close to 2^63 rather than close to 2^64. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel