When you said "temporary arrays", did you mean that I should use less 
global variables? Yes, I can write arrays such as x, px only in the 
functions Potential and Ko, but that will make the code less 
understandable. 

When starting the code, I use Ks=2^11, but when I finally plot the \phi 
function, I get an error, and it seems that I misunderstood the linspace 
function (I wrote linspace(-pm,pm,Ks) at first), so I changed it to 2^11+1. 
It's a bad choice, and now I realize it. When use Ks=2^11-1, it only takes 
81 seconds for the code to run, which is slightly faster than the Matlab 
version.

I have already stated the variables \phi and Pl outside the loop, and in my 
opinion, they are pre-allocated and "should" be manipulated in place. To be 
honest, maybe I don't really understand the words "in place" and 
"pre-allocated".

I changed the sentences normalizing the array and use the function 
scale!(), and get a performance improvement of about 5s, that's also very 
good.

But I don't understand why the assignment \phi .*=ko2 is unnecessary.  Are 
you saying that I should rewrite them as scale(\phi, ko2)? I'll have a try.


Finally, I do really appreciated all your help. Mainly following your 
advice, my code got great performance improvement. 





在 2015年12月15日星期二 UTC+8上午2:13:12,Yichao Yu写道:
>
> On Sun, Dec 13, 2015 at 10:35 PM, 博陈 <[email protected] <javascript:>> 
> wrote: 
> > This is my first julia code, I am happy it did the right thing, but 
> compared 
> > with the Matlab code that did the same thing, it runs so slowly. The 
> Matlab 
> > code takes about 90s, but the julia below takes 130s. 
>
> There's a few problems with the code. See below. You are still 
> creating many temporary arrays which is in general a bad idea. 
>
> > 
> > Potential(x::Float64, y::Float64) = -2./sqrt(x^2+1.) -2./sqrt(y^2+1.)  + 
> > 1./sqrt((x-y)^2+1.) 
> > Ko(x::Float64,y::Float64) = x^2+y^2 
> > 
> > function ground() 
> >     R = 320. 
> >     Ks = 2^11 
> >     x = linspace(-R, R, Ks+1) 
>
> As a general advice, don't use this dimension for fft. If you replace 
> you Ks with 2^11 - 1, you will see at least 2x speed up. 
>
> >     dx = 2R/Ks 
> >     dt = 0.1 
> >     pm = 2π/2dx 
> >     px = circshift( linspace(-pm, pm, Ks+1),  round(Int64, Ks/2) ) 
> >     FFTW.set_num_threads(8) 
> > 
> >     V = Float64[Potential(ix, iy)  for ix in x, iy in x] 
> >     ko = Float64[e^(-dt/4* (Ko(ipx,ipy))) for ipx in px, ipy in px] 
> >     ko2 = ko.^2 
> >     vo = Float64[e^(-dt* (Potential(ix,iy))) for ix in x, iy in x] 
> >     ϕ = Array(Complex128,(Ks+1,Ks+1)) 
> >     Pl = plan_fft!(ϕ; flags=FFTW.MEASURE) 
> >     ϕ = V.*complex(1.) 
> > 
> >     normphi = sqrt(sumabs2(ϕ))*dx 
> >     ϕ /= normphi 
> > 
> >     ϕ = Pl*ϕ 
> >     ϕ .*= ko 
> >     ϕ = Pl\ϕ 
> > 
> >     Δϕ = 1. 
> >     nstep = 0 
> >     print("start loop") 
> >      while Δϕ > 1.e-15 
> >         ϕ′ = ϕ 
>
> You are better off pre-allocate the two buffers and swap them 
>
> >         ϕ .*= vo 
>
> Use `scale!` 
>
> > 
> >         normphi = sqrt(sumabs2(ϕ))*dx 
> >         ϕ /= normphi 
>
> Also use `scale!` or write the loop out 
>
> > 
> >         ϕ = Pl*ϕ 
>
> The assignment is not necessary. 
>
> >         ϕ .*= ko2 
>
> scale! 
>
> >         ϕ = Pl\ϕ 
>
> Unnecessary 
>
> > # if check  Δϕ for every step, 35s is needed. 
> >         if nstep>500 
> >             Δϕ = maxabs(ϕ-ϕ′) 
> >         end 
> >         nstep += 1 
> >         if mod(nstep,200)==0 
> >             print(nstep,"  ") 
> >             print("Δϕ=",Δϕ,"   ") 
> >         end 
> > 
> >     end 
> >     ϕ .*= vo 
> > 
> >     ϕ = Pl*ϕ 
> >     ϕ .*= ko 
> >     ϕ = Pl\ϕ 
> > 
> >     normphi = sqrt(sumabs2(ϕ))*dx 
> >     ϕ /= normphi 
> > end 
> > 
>

Reply via email to