When you said "temporary arrays", did you mean that I should use less global variables? Yes, I can write arrays such as x, px only in the functions Potential and Ko, but that will make the code less understandable.
When starting the code, I use Ks=2^11, but when I finally plot the \phi function, I get an error, and it seems that I misunderstood the linspace function (I wrote linspace(-pm,pm,Ks) at first), so I changed it to 2^11+1. It's a bad choice, and now I realize it. When use Ks=2^11-1, it only takes 81 seconds for the code to run, which is slightly faster than the Matlab version. I have already stated the variables \phi and Pl outside the loop, and in my opinion, they are pre-allocated and "should" be manipulated in place. To be honest, maybe I don't really understand the words "in place" and "pre-allocated". I changed the sentences normalizing the array and use the function scale!(), and get a performance improvement of about 5s, that's also very good. But I don't understand why the assignment \phi .*=ko2 is unnecessary. Are you saying that I should rewrite them as scale(\phi, ko2)? I'll have a try. Finally, I do really appreciated all your help. Mainly following your advice, my code got great performance improvement. 在 2015年12月15日星期二 UTC+8上午2:13:12,Yichao Yu写道: > > On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <[email protected] <javascript:>> > wrote: > > This is my first julia code, I am happy it did the right thing, but > compared > > with the Matlab code that did the same thing, it runs so slowly. The > Matlab > > code takes about 90s, but the julia below takes 130s. > > There's a few problems with the code. See below. You are still > creating many temporary arrays which is in general a bad idea. > > > > > Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.) + > > 1./sqrt((x-y)^2+1.) > > Ko(x::Float64,y::Float64) = x^2+y^2 > > > > function ground() > > R = 320. > > Ks = 2^11 > > x = linspace(-R, R, Ks+1) > > As a general advice, don't use this dimension for fft. If you replace > you Ks with 2^11 - 1, you will see at least 2x speed up. > > > dx = 2R/Ks > > dt = 0.1 > > pm = 2π/2dx > > px = circshift( linspace(-pm, pm, Ks+1), round(Int64, Ks/2) ) > > FFTW.set_num_threads(8) > > > > V = Float64[Potential(ix, iy) for ix in x, iy in x] > > ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] > > ko2 = ko.^2 > > vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] > > ϕ = Array(Complex128,(Ks+1,Ks+1)) > > Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) > > ϕ = V.*complex(1.) > > > > normphi = sqrt(sumabs2(ϕ))*dx > > ϕ /= normphi > > > > ϕ = Pl*ϕ > > ϕ .*= ko > > ϕ = Pl\ϕ > > > > Δϕ = 1. > > nstep = 0 > > print("start loop") > > while Δϕ > 1.e-15 > > ϕ′ = ϕ > > You are better off pre-allocate the two buffers and swap them > > > ϕ .*= vo > > Use `scale!` > > > > > normphi = sqrt(sumabs2(ϕ))*dx > > ϕ /= normphi > > Also use `scale!` or write the loop out > > > > > ϕ = Pl*ϕ > > The assignment is not necessary. > > > ϕ .*= ko2 > > scale! > > > ϕ = Pl\ϕ > > Unnecessary > > > # if check Δϕ for every step, 35s is needed. > > if nstep>500 > > Δϕ = maxabs(ϕ-ϕ′) > > end > > nstep += 1 > > if mod(nstep,200)==0 > > print(nstep," ") > > print("Δϕ=",Δϕ," ") > > end > > > > end > > ϕ .*= vo > > > > ϕ = Pl*ϕ > > ϕ .*= ko > > ϕ = Pl\ϕ > > > > normphi = sqrt(sumabs2(ϕ))*dx > > ϕ /= normphi > > end > > >
