David Harvey <d.har...@unsw.edu.au> writes: >> I guess you could mitigate this a bit by rounding u up to get a >> polynomial with reasonably low weight. > > Yes, but doesn't this miss the whole point???
You could round up so you get at most, say, three factors (x^{2^k_j} - 1). That should reduce the performance steps quite a lot compared to using a single factor x^{2^k} - 1. Anyway, doing this TFT-like processing on the input integer instead seems preferable, as you have explained. For wraparound applications, I guess it would be nice if one could give the CRT reconstruction an additional input. So it gets x mod (2^u_j + 1) for a couple of j:s from the core FFT multiply routines, and x mod 2^u from the application, and then it reconstructs x. > My understanding is that O(n log n) of the work gets passed down to > full-size FFTs as you state above, but I'm talking here about the O(n) > work, at the top layer, including e.g. all the cross-butterflies. I see. I think I never got to even consider assembly optimzation for those. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel