Dear Niels, > From: [email protected] (Niels Möller) > Date: Mon, 15 Apr 2019 14:02:28 +0200 > > paul zimmermann <[email protected]> writes: > > > Schönhage-Strassen can be implemented with good cache locality: > > > > https://hal.inria.fr/inria-00126462 > > Thanks! As I read it, for large inputs, the top level transforms operate > on coefficients that fit in L2 cache but not L1 cache, and with several > passes over the data, depending on fft size. Is that right?
not quite. In Section 2.2.3 (Bailey's 4-step transform) we describe a way to please both the L2 and L1 caches. Of course this work should be revisited with modern processors. > Then I think small-prime fft has potential for better locality, with few > complete passes of the data and all the heavy fft work operating on data > in the L1 cache. maybe, I'm curious to see a comparison with our code! Paul _______________________________________________ gmp-devel mailing list [email protected] https://gmplib.org/mailman/listinfo/gmp-devel
