On Mon, Dec 14, 2015 at 1:12 PM, Yichao Yu <[email protected]> wrote:
> On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <[email protected]> wrote:
>> This is my first julia code, I am happy it did the right thing, but compared
>> with the Matlab code that did the same thing, it runs so slowly. The Matlab
>> code takes about 90s, but the julia below takes 130s.
Another issue is that the code is not vectorizable (SIMD sense) due to
the way we store the complex array. (This is almost identical to the
simulation I am doing so I've run into this problem.) I might do some
more benchmarks on alternatives and report back later.
>
> There's a few problems with the code. See below. You are still
> creating many temporary arrays which is in general a bad idea.
>
>>
>> Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) +
>> 1./sqrt((x-y)^2+1.)
>> Ko(x::Float64,y::Float64) = x^2+y^2
>>
>> function ground()
>> R = 320.
>> Ks = 2^11
>> x = linspace(-R, R, Ks+1)
>
> As a general advice, don't use this dimension for fft. If you replace
> you Ks with 2^11 - 1, you will see at least 2x speed up.
>
>> dx = 2R/Ks
>> dt = 0.1
>> pm = 2π/2dx
>> px = circshift( linspace(-pm, pm, Ks+1), round(Int64, Ks/2) )
>> FFTW.set_num_threads(8)
>>
>> V = Float64[Potential(ix, iy) for ix in x, iy in x]
>> ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px]
>> ko2 = ko.^2
>> vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x]
>> ϕ = Array(Complex128,(Ks+1,Ks+1))
>> Pl = plan_fft!(ϕ; flags=FFTW.MEASURE)
>> ϕ = V.*complex(1.)
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>>
>> ϕ = Pl*ϕ
>> ϕ .*= ko
>> ϕ = Pl\ϕ
>>
>> Δϕ = 1.
>> nstep = 0
>> print("start loop")
>> while Δϕ > 1.e-15
>> ϕ′ = ϕ
>
> You are better off pre-allocate the two buffers and swap them
>
>> ϕ .*= vo
>
> Use `scale!`
>
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>
> Also use `scale!` or write the loop out
>
>>
>> ϕ = Pl*ϕ
>
> The assignment is not necessary.
>
>> ϕ .*= ko2
>
> scale!
>
>> ϕ = Pl\ϕ
>
> Unnecessary
>
>> # if check Δϕ for every step, 35s is needed.
>> if nstep>500
>> Δϕ = maxabs(ϕ-ϕ′)
>> end
>> nstep += 1
>> if mod(nstep,200)==0
>> print(nstep," ")
>> print("Δϕ=",Δϕ," ")
>> end
>>
>> end
>> ϕ .*= vo
>>
>> ϕ = Pl*ϕ
>> ϕ .*= ko
>> ϕ = Pl\ϕ
>>
>> normphi = sqrt(sumabs2(ϕ))*dx
>> ϕ /= normphi
>> end
>>