Hi Steven,
OK that makes sense. But why is Julia 2x slower than Matlab for some
values of n (see below, the timing difference is consistent when looped over)?
Shouldn’t they both be using FFTW?
julia> r=rand(2000000);plan=plan_fft(r); @time plan*r;
0.080122 seconds (12 allocations: 61.035 MB, 12.91% gc time)
julia> r=rand(2000001);plan=plan_fft(r); @time plan*r;
0.287002 seconds (12 allocations: 61.036 MB, 3.76% gc time)
julia> r=rand(1999999);plan=plan_fft(r); @time plan*r;
0.218389 seconds (12 allocations: 61.035 MB, 4.76% gc time)
>> r=rand(2000000,1);tic();fft(r);toc()
Elapsed time is 0.023956 seconds.
>> r=rand(2000001,1);tic();fft(r);toc()
Elapsed time is 0.303320 seconds.
>> r=rand(1999999,1);tic();fft(r);toc()
Elapsed time is 0.131212 seconds.
> On 23 Sep 2015, at 3:52 am, Steven G. Johnson <[email protected]> wrote:
>
>
>
> On Tuesday, September 22, 2015 at 6:51:50 AM UTC-4, Sheehan Olver wrote:
> I get the following timings with the various FFTW routines, where I ran each
> line multiple times to make sure it was accurate. Why are REDFT00 and
> RODFT00 almost 10x slower?
>
> This is because a REDFT00 (DCT-I) of n points is exactly equivalent to a DFT
> of N=2(n-1) points with real-even data. Although FFTW uses a different
> algorithm, not a size-N complex FFT, the fast algorithms are essentially
> equivalent to pruned versions of the FFT algorithms for N, and depend on the
> factorization of N, not of n. Similarly for RODFT00 (DST-I), except N=2(n+1)
> in that case. In contrast, the other DCT/DST types correspond to DFTs of
> length N=2n or N=4n or N=8n, so their speed depends on the factorization of n.
>
> If n=100000 as in your example, then n-1 has large prime factors (41, 271),
> so the FFT algorithms are much slower, although they are still O(n log n).
> If you try n=100001, you will notice that REDFT00 becomes much faster (but
> other DCT types become slower).
>
> This is mentioned in the FFTW manual:
>
> http://www.fftw.org/doc/Real-even_002fodd-DFTs-_0028cosine_002fsine-transforms_0029.html
>
> "FFTW is most efficient when N is a product of small factors; note that this
> differs from the factorization of the physical size n for REDFT00 and
> RODFT00!"
>