Dear Niels,

> From: [email protected] (Niels Möller)
> Date: Mon, 15 Apr 2019 14:02:28 +0200
> 
> paul zimmermann <[email protected]> writes:
> 
> > Schönhage-Strassen can be implemented with good cache locality:
> >
> > https://hal.inria.fr/inria-00126462
> 
> Thanks! As I read it, for large inputs, the top level transforms operate
> on coefficients that fit in L2 cache but not L1 cache, and with several
> passes over the data, depending on fft size. Is that right?

not quite. In Section 2.2.3 (Bailey's 4-step transform) we describe a way
to please both the L2 and L1 caches. Of course this work should be revisited
with modern processors.

> Then I think small-prime fft has potential for better locality, with few
> complete passes of the data and all the heavy fft work operating on data
> in the L1 cache.

maybe, I'm curious to see a comparison with our code!

Paul
_______________________________________________
gmp-devel mailing list
[email protected]
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to