On Wednesday, September 23, 2015 at 9:07:51 PM UTC-4, Sheehan Olver wrote: > > OK that makes sense. But then why is rfft on a vector length 2*(n-1) more > than 2x faster than FFT.REDFT00? > > *julia> **r=rand(100000);@time for k=1:100 FFTW.r2r(r,FFTW.REDFT00) end;* > 2.496130 seconds (8.30 k allocations: 76.703 MB, 0.76% gc time) > *julia> **r=rand(2*(100000-1));@time for k=1:100 rfft(r) end;* > 0.943706 seconds (8.30 k allocations: 152.985 MB, 1.58% gc time) >
The short answer is that rfft is much more optimized in FFTW than than the r2r transforms. The long answer is REDFT00 transforms of *even* lengths n are especially bad, because the algorithms for this problem are not very attractive. For an even length, the options are: 1) use an algorithm from FFTPACK or Numerical Recipes that turns it into a rfft of the same length plus O(n) pre/post-processing. We implemented this, but don't use it because it seems to suffer from severe accuracy problems: https://github.com/FFTW/fftw3/blob/master/reodft/redft00e-r2hc.c 2) pad symmetrically to an rfft of twice the length. This is accurate, but sacrices a factor of 2 in speed as you noticed: https://github.com/FFTW/fftw3/blob/master/reodft/redft00e-r2hc-pad.c 3) if n-1 has a small prime factor r, implement a pruned version of the analogous radix-r Cooley-Tukey algorithm. This would work and be accurate, but is a lot of work to implement, and we didn't both except in the case where n is *odd* (so that we can use radix r=2, or actually split radix: https://github.com/FFTW/fftw3/blob/master/reodft/reodft00e-splitradix.c) > PS Why doesn't fft(::Vector{Float64}) automatically call rfft and > re-interpret the output? > We could, but my feeling was that if you really care about factors of two in performance, you should be using rfft directly. That also lets you save a factor of two in memory, and a factor of two in any post-processing computations as well, and additionally lets you save a factor of two in any subsequent inverse transform (if you need it, e.g. for convolutions or filtering). Matlab, in contrast, doesn't expose an rfft interface, so you can't get all of the savings that are possible in this case. So they might as well get what they can from fft(x).
