On Wed, Dec 9, 2015 at 5:57 AM, 博陈 <chenphysic...@gmail.com> wrote:
> > <https://lh3.googleusercontent.com/-lTsIsN0BaAY/VmgIypsEQ2I/AAAAAAAAAAk/n-j-ZalGl5I/s1600/QQ%25E6%2588%25AA%25E5%259B%25BE20151209185519.png> > the optimization strategy for fft given by the official documentation > seems to fail. Why? > You didn't mention exactly what optimization strategy you are trying so I would need to guess. 1. You should expect the first one to be no faster than the last one since it's basically doing the same thing and the first one does it all in global scope 2. In place op doesn't make too much a difference here since the operation you are doing is already very expensive. (most of the time are spent in FFTW) 3. It doesn't really matter for this one (since FFTW determines the performance here) but you should benchmark the loop in a function and hoist the creation of the plan out of the loop. For your actual code, you might want to make the plan a global constant or a parametrized field of a type since it has not been not particularly type stable. 4. You can use `plan_fft(...., flags=FFTW.MEASURE)` to let FFTW select the best algorithm by actually measuring the time instead of guessing. It gives me 20% to 30% speed up for your example and IIRC more speed up for small problems. 5. You can use `FFTW.flops(p)` to figure out how much floating point operations are needed to perform your transformation. On my computer, a MEASURE'd plan takes 4.3s (100 times) and the naive estimation from assuming one operation per clock cycle is 2s (100 times) so it's the right order of magnitude.