Re: [julia-users] FFTW.REDFT00 and FFT.RODFT00 are ~10x slower than other routines?

Steven G. Johnson Fri, 25 Sep 2015 08:41:34 -0700


On Wednesday, September 23, 2015 at 9:07:51 PM UTC-4, Sheehan Olver wrote:
>
> OK that makes sense.  But then why is rfft on a vector length 2*(n-1) more 
> than 2x faster than FFT.REDFT00?
>
> *julia> **r=rand(100000);@time for k=1:100 FFTW.r2r(r,FFTW.REDFT00) end;*
>   2.496130 seconds (8.30 k allocations: 76.703 MB, 0.76% gc time)
> *julia> **r=rand(2*(100000-1));@time for k=1:100 rfft(r) end;*
>   0.943706 seconds (8.30 k allocations: 152.985 MB, 1.58% gc time)
>


The short answer is that rfft is much more optimized in FFTW than than the 
r2r transforms.

The long answer is REDFT00 transforms of *even* lengths n are especially 
bad, because the algorithms for this problem are not very attractive.   For 
an even length, the options are:

   1) use an algorithm from FFTPACK or Numerical Recipes that turns it into 
a rfft of the same length plus O(n) pre/post-processing.   We implemented 
this, but don't use it because it seems to suffer from severe accuracy 
problems: https://github.com/FFTW/fftw3/blob/master/reodft/redft00e-r2hc.c
    2) pad symmetrically to an rfft of twice the length.  This is accurate, 
but sacrices a factor of 2 in speed as you noticed: 
 https://github.com/FFTW/fftw3/blob/master/reodft/redft00e-r2hc-pad.c
    3) if n-1 has a small prime factor r, implement a pruned version of the 
analogous radix-r Cooley-Tukey algorithm.  This would work and be accurate, 
but is a lot of work to implement, and we didn't both except in the case 
where n is *odd* (so that we can use radix r=2, or actually split 
radix: https://github.com/FFTW/fftw3/blob/master/reodft/reodft00e-splitradix.c)
  

> PS  Why doesn't fft(::Vector{Float64}) automatically call rfft and 
> re-interpret the output? 
>

We could, but my feeling was that if you really care about factors of two 
in performance, you should be using rfft directly.  That also lets you save 
a factor of two in memory, and a factor of two in any post-processing 
computations as well, and additionally lets you save a factor of two in any 
subsequent inverse transform (if you need it, e.g. for convolutions or 
filtering).

Matlab, in contrast, doesn't expose an rfft interface, so you can't get all 
of the savings that are possible in this case.  So they might as well get 
what they can from fft(x).

Re: [julia-users] FFTW.REDFT00 and FFT.RODFT00 are ~10x slower than other routines?

Reply via email to