I haven't look at this myself, but have you tried Stefan's suggestion to look 
at `@code_warntype`? This might not be a preallocation issue, it might be a 
type-stability issue.

--Tim

On Tuesday, March 29, 2016 01:56:21 PM Yichao Yu wrote:
> On Tue, Mar 29, 2016 at 1:50 PM, 博陈 <[email protected]> wrote:
> > Additionally, the allocation problem is not solved. I guess this
> > http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-tempo
> > rary-arrays-being-created-in-a-function-td14492.html might be helpful, but
> > I just don't know how to change my code.
> 
> The only places you create temporary arrays according to your profile is
> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of
> creating an subarray
> 
> > 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
> > 
> >> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <[email protected]> wrote:
> >>> I tried the built-in profiler, and find that the problem lies in lines I
> >>> end  with ******, the result is shown below:
> >>> that proved my guess, how can I pre-allocate these arrays? If I don't
> >>> want to divide this code into several parts that calculate these arrays
> >>> separately.
> >> 
> >> I don't understand what you mean by `divide this code into several parts
> >> that calculate these arrays separately`
> >> 
> >>> | lines | backtrace |
> >>> | 
> >>> |   169 |      9011 |  ***********
> >>> |   
> >>> |   173 |      1552 |
> >>> |   
> >>> |   175 |      2604 |
> >>> |   
> >>> |   179 |      2906 |
> >>> |   
> >>> |   181 |      1541 |
> >>> |   
> >>> |   192 |      4458 |
> >>> |   
> >>> |   211 |     13332 ************|
> >>> |   
> >>> |   214 |      8431 |************
> >>> |   
> >>> |   218 |     15871 |***********
> >>> |   
> >>> |   221 |      2538 |
> >>> 
> >>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
> >>> 
> >>>> Have you tried:
> >>>> 
> >>>> (a) calling @code_typewarn on your function
> >>>> (b) using the built-in profiler?
> >>>> 
> >>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <[email protected]> wrote:
> >>>>> First of all, have a look at the result.
> >>>>> 
> >>>>> 
> >>>>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/AAAAAAAAAB
> >>>>> E/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%
> >>>>> 258720160329210732.png>
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> My code calculates the evolution of 1-d 2-electron system in the
> >>>>> electric field, some variables are calculated during the evolution.
> >>>>> According to the result of @time evolution, my code must have a
> >>>>> pre-allocation problem. Before you see the long code, i suggest that
> >>>>> the
> >>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt
> >>>>> that I
> >>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and
> >>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in
> >>>>> my
> >>>>> situations, I don't know how to pre-allocate these arrays.
> >>>>> 
> >>>>> Below is the script (precisely, the main function) itself.
> >>>>> 
> >>>>> function evolution(ϕ::Array{Complex{Float64}, 2},
> >>>>> 
> >>>>>                    ele::Array{Float64, 1}, dx::Float64, dt::Float64,
> >>>>>                    flags::Tuple{Int64, Int64, Int64, Int64})
> >>>>>     
> >>>>>     ϕg = copy(ϕ)
> >>>>>     FFTW.set_num_threads(8)
> >>>>>     ns = length( ϕ[:, 1] )
> >>>>>     x = get_x(ns, dx)
> >>>>>     p = get_p(ns, dx)
> >>>>>     if flags[4] == 1
> >>>>>     
> >>>>>         pp = similar(p)
> >>>>>         A = -cumsum(ele) * dt
> >>>>>         A² = A.*A
> >>>>>         ##### splitting
> >>>>>         r_sp = 150.0
> >>>>>         δ_sp = 5.0
> >>>>>         splitter = Array(Float64, ns, ns)
> >>>>>     
> >>>>>     end
> >>>>>     nt = length( ele )
> >>>>>     
> >>>>>     # ##### Pre-allocate result and temporary arrays
> >>>>>     #if flags[1] == 1
> >>>>>     σ = zeros(Complex128, nt)
> >>>>>     #end
> >>>>>     #if flags[2] == 1
> >>>>>     a = zeros(Float64, nt)
> >>>>>     #end
> >>>>>     #if flags[3] == 1
> >>>>>     r_ionization = 20.0
> >>>>>     n1 = round(Int, ns/2 - r_ionization/dx)
> >>>>>     n2 = round(Int, ns/2 + r_ionization/dx)
> >>>>>     ip = zeros(Float64, nt)
> >>>>>     #end
> >>>>>     
> >>>>>     ##### FFT plan
> >>>>>     p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
> >>>>>     
> >>>>>     prop_x = similar( ϕ )
> >>>>>     prop_p = similar( prop_x )
> >>>>>     prop_e = similar( prop_x )
> >>>>>     # this two versions just cost the same time
> >>>>>     xplusy = Array(Float64, ns, ns)
> >>>>>     #xplusy = Array( Float64, ns^2)
> >>>>>     
> >>>>>     ##### absorb boundary
> >>>>>     r_a = ns * dx /2 - 50.0
> >>>>>     δ = 10.0
> >>>>>     absorb = Array(Float64, ns, ns)
> >>>>>     
> >>>>>     k0 = 2π / (ns * dx)
> >>>>>     
> >>>>>     @inbounds for j in 1:ns
> >>>>>     
> >>>>>         @inbounds for i in 1:ns
> >>>>>         
> >>>>>             prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt /
> >>>>> 
> >>>>> 2 )
> >>>>> 
> >>>>>             prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
> >>>>>             
> >>>>>             xplusy[i, j] = x[i] + x[j]
> >>>>>             
> >>>>>             absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 -
> >>>>> 
> >>>>> get_out(x[j],
> >>>>> 
> >>>>>              r_a, δ))
> >>>>>         
> >>>>>         end
> >>>>>     
> >>>>>     end
> >>>>>     
> >>>>>     if flags[2] == 1
> >>>>>     
> >>>>>         pvpx = Array(Float64, ns, ns)
> >>>>>         @inbounds for j in 1:ns
> >>>>>         
> >>>>>             @inbounds for i in 1:ns
> >>>>>             
> >>>>>                 pvpx[i, j] = get_pvpx(x[i], x[j])
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>     
> >>>>>     end
> >>>>>     
> >>>>>     if flags[4] == 1
> >>>>>     
> >>>>>         ϕo = zeros(Complex128, ns, ns)
> >>>>>         ϕp = zeros(Complex128, ns, ns)
> >>>>>         @inbounds for  j in 1:ns
> >>>>>         
> >>>>>             @inbounds for  i in 1:ns
> >>>>>             
> >>>>>                 splitter[i, j] = get_out(x[i], r_sp, δ_sp) *
> >>>>> 
> >>>>> get_out(x[j], r_sp, δ_sp)
> >>>>> 
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>     
> >>>>>     end
> >>>>>     
> >>>>>     for i in 1:nt
> >>>>>     
> >>>>>         for j in eachindex(ϕ)
> >>>>>         
> >>>>>             prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0)
> >>>>> 
> >>>>> ************************************169
> >> 
> >> You might be hitting a stupid inlining issue here, try adding parenthesis
> >> to the multiplication
> >> (i.e. instead of `a * b * c * d` do `(a * b) * (c * d)`)
> >> 
> >>>         end
> >>>         
> >>>>>         for j in eachindex(ϕ)
> >>>>>         
> >>>>>             ϕ[j] *= prop_x[j] * prop_e[j]
> >>>>>         
> >>>>>         end
> >>>>>         p_fft! * ϕ
> >>>>>         for j in eachindex(ϕ)
> >>>>>         
> >>>>>             ϕ[j] *= prop_p[j]
> >>>>>         
> >>>>>         end
> >>>>>         p_fft! \ ϕ
> >>>>>         for j in eachindex(ϕ)
> >>>>>         
> >>>>>             ϕ[j] *= prop_x[j] * prop_e[j]
> >>>>>         
> >>>>>         end
> >>>>>         ########## autocorrelation function σ(t)
> >>>>>         if flags[1] == 1
> >>>>>         
> >>>>>             for j in eachindex(ϕ)
> >>>>>             
> >>>>>                 σ[i] += conj(ϕg[j]) * ϕ[j]
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>         ########## dipole acceleration a(t)
> >>>>>         if flags[2] == 1
> >>>>>         
> >>>>>             for j in eachindex(ϕ)
> >>>>>             
> >>>>>                 a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>         ########## ionization probability ip(t)
> >>>>>         if flags[3] == 1
> >>>>>         
> >>>>>             for j1 in n1:n2
> >>>>>             
> >>>>>                 for j2 in 1:ns
> >>>>>                 
> >>>>>                     ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
> >>>>>                 
> >>>>>                 end
> >>>>>             
> >>>>>             end
> >>>>>             for j1 in [1:n1-1; n2+1:ns]
> >>>>>             
> >>>>>                 for j2 in n1:n2
> >>>>>                 
> >>>>>                     ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
> >>>>>                 
> >>>>>                 end
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>         ########## get momentum
> >>>>>         if flags[4] == 1
> >>>>>         
> >>>>>             for j in eachindex(ϕo)
> >>>>>             
> >>>>>                 ϕo[j] = ϕ[j] * splitter[j] * exp( -im * A[i]*xplusy[j]
> >>>>> 
> >>>>> ) **********************************211
> >> 
> >> Same with above
> >> 
> >>>             end
> >>>             
> >>>>>             for j in eachindex(p)
> >>>>>             
> >>>>>                 pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) +
> >>>>> 
> >>>>> sum( A²[1:nt] ) /2 ******************214
> >> 
> >> write out the sum directly, you can do with a helper function
> >> Using subarray would also eliminate the data copy but is still suboptimum
> >> as it is now.
> >> 
> >>>             end
> >>>             
> >>>>>             for j2 in 1:ns
> >>>>>             
> >>>>>                 for j1 in 1:ns
> >>>>>                 
> >>>>>                     ϕo[j1, j2] = ϕo[j1, j2] * exp( -im * (pp[j1] +
> >>>>> 
> >>>>> pp[j2]) * dt)************************218
> >> 
> >> I don't see any obvious problem, (apart from the potential inlining issue
> >> as above) but it does look like a keep loop with c function call so it
> >> won't be surprising if most of the time is spent here.
> >> 
> >>>                 end
> >>>>>             
> >>>>>             end
> >>>>>             p_fft! * ϕo
> >>>>>             for j in eachindex(ϕp)
> >>>>>             
> >>>>>                 ϕp[j] += ϕo[j]
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>         
> >>>>>         ########## absorb boundary
> >>>>>         if mod(i, 300) == 0
> >>>>>         
> >>>>>             for j in eachindex(ϕ)
> >>>>>             
> >>>>>                 ϕ[j] *= absorb[j]
> >>>>>             
> >>>>>             end
> >>>>>         
> >>>>>         end
> >>>>>         
> >>>>>         if (mod(i, 500) == 0)
> >>>>>         
> >>>>>             println("i = $i")
> >>>>>             flush(STDOUT)
> >>>>>         
> >>>>>         end
> >>>>>     
> >>>>>     end
> >>>>>     σ *= dx^2
> >>>>>     a *= dx^2
> >>>>>     ip *= dx^2
> >>>>>     
> >>>>>     save("data/fs.jld", "ϕ", ϕ)
> >>>>>     if flags[1] == 1
> >>>>>     
> >>>>>         save("data/sigma.jld", "σ", σ)
> >>>>>     
> >>>>>     end
> >>>>>     if flags[2] == 1
> >>>>>     
> >>>>>         save("data/a.jld", "a", a)
> >>>>>     
> >>>>>     end
> >>>>>     if flags[3] == 1
> >>>>>     
> >>>>>         save("data/ip.jld", "ip", ip)
> >>>>>     
> >>>>>     end
> >>>>>     if flags[4] == 1
> >>>>>     
> >>>>>         save("data/pf.jld", "ϕp", ϕp)
> >>>>>     
> >>>>>     end
> >>>>>     
> >>>>>     #return σ, a, ip, ϕ
> >>>>>     nothing
> >>>>> 
> >>>>> end

Reply via email to