Hey there cool people! Seems like this topic will be pretty similar to this one <https://groups.google.com/forum/#!topic/julia-users/JsI2i80f-uU>.
So, I am trying to write my version of alignsignals <http://uk.mathworks.com/help/signal/ref/alignsignals.html?requestedDomain=www.mathworks.com>. In few words, I calculate the cross-correlation, find the peak, calculate its shift from the origin of time, use it to align the signals. I got three version of a function to achieve this, using xcorr, fft and rfft respectively. When using two signals one minute long, sampled at 48 kHz, they benchmark <https://github.com/johnmyleswhite/Benchmarks.jl> as follows: xcorr() version: ================ Benchmark Results ======================== Time per evaluation: 832.16 ms [663.40 ms, 1.00 s] Proportion of time in GC: 15.94% [14.32%, 17.57%] Memory allocated: 460.01 mb Number of allocations: 239 allocations Number of samples: 10 Number of evaluations: 10 Time spent benchmarking: 9.73 s rfft() version: ================ Benchmark Results ======================== Time per evaluation: 1.01 s [654.63 ms, 1.36 s] Proportion of time in GC: 12.11% [8.30%, 15.91%] Memory allocated: 306.21 mb Number of allocations: 344 allocations Number of samples: 6 Number of evaluations: 6 Time spent benchmarking: 7.58 s fft() version: ================ Benchmark Results ======================== Time per evaluation: 715.45 ms [536.86 ms, 894.04 ms] Proportion of time in GC: 20.37% [16.49%, 24.24%] Memory allocated: 569.87 mb Number of allocations: 276 allocations Number of samples: 13 Number of evaluations: 13 Time spent benchmarking: 10.56 s It seems that the fft() version is the fastest while the rfft() version is the one that allocates less memory. According to my understanding, rfft() should also be faster with respect fft(). Unfortunately, the FFTW plan in the post I linked above does not seem to work. Are you able to suggest how to plan rfft() to make it faster? I will have to align many data collections sampled at 96 kHz minium, usually at least 30 s long. I plan to do it in a loopy structure, so it would be of great advantage to have quickest execution and lowest memory usage. Here the code defining the functions: function alignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1}) # Delay as lag of cross correlation peak from origin of time. lₓ = length(x) lᵤ = length(u) lₛ = lₓ + lᵤ - 1 sₓᵤ::Array{T, 1} = xcorr(x, u) if mod(lₛ, 2) == 0 ct_idx = fld(lₛ, 2) else ct_idx = fld(lₛ, 2) + 1 end _::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1) δ::Int = ct_idx - pk_idx[1] # Align: y = zeros(T, lᵤ) if δ > 0 y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ] else y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)] end return y, δ end function rfftalignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1}; ncores ::Int = 1) lₓ = length(x) lᵤ = length(u) FFTW.set_num_threads(ncores) x_nfft = nextprod(primes(10), lₓ) u_nfft = nextprod(primes(10), lᵤ) nfft = max(x_nfft, u_nfft) X = rfft([x; zeros(nfft - lₓ)]) U = rfft([u; zeros(nfft - lᵤ)]) Sₓᵤ = conj(X) .* U sₓᵤ = flipdim(fftshift(irfft(Sₓᵤ, 2 * length(Sₓᵤ) - 1)), 1) lₛ = length(sₓᵤ) if mod(lₛ, 2) == 0 ct_idx = fld(lₛ, 2) else ct_idx = fld(lₛ, 2) + 1 end _::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1) δ::Int = ct_idx - pk_idx[1] # Align: y = zeros(T, lᵤ) if δ > 0 y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ] else y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)] end return y, δ end function fftalignsignals{T <: Real}(x::Array{T, 1}, u::Array{T, 1}; ncores:: Int = 1) lₓ = length(x) lᵤ = length(u) FFTW.set_num_threads(ncores) x_nfft = nextprod(primes(10), lₓ) u_nfft = nextprod(primes(10), lᵤ) nfft = max(x_nfft, u_nfft) X = fft([x; zeros(x_nfft - lₓ)]) U = fft([u; zeros(u_nfft - lᵤ)]) Sₓᵤ = conj(X) .* U sₓᵤ = abs(flipdim(fftshift(ifft(Sₓᵤ)), 1)) lₛ = length(sₓᵤ) if mod(lₛ, 2) == 0 ct_idx = fld(lₛ, 2) else ct_idx = fld(lₛ, 2) + 1 end _::Array{T, 1}, pk_idx::Array{Int, 1} = findmax(sₓᵤ, 1) δ::Int = ct_idx - pk_idx[1] # Align: y = zeros(T, lᵤ) if δ > 0 y[1:(lᵤ - δ)] = u[(δ + 1):lᵤ] else y[(-δ + 1):lᵤ] = u[1:(lᵤ - -δ)] end return y, δ end
