Surely not the main issue here but doing `length( ϕ[:, 1] )` wastes memory 
for nothing. Instead use `size(ϕ, 1)` ?
Also `for j1 in [1:n1-1; n2+1:ns]` this can be split to 2 for loops to 
avoid allocations and memory loads. 
j1 is off by 1 in indexing so just decrease ranges by 1 as `0:n1-2` and 
n2:ns-1` and use j1 instead of `j1-1` (The same for previous loop).

On Wednesday, March 30, 2016 at 8:58:55 AM UTC+3, 博陈 wrote:
>
> I rewrote my code and manually loop from 1:n, the pre-allocation problem 
> is solved. I also added some parenthesis as you suggested, that helps, but 
> not very much. 
>
> 在 2016年3月30日星期三 UTC+8上午1:56:47,Yichao Yu写道:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 1:50 PM, 博陈 <[email protected]> wrote:
>>
>>> Additionally, the allocation problem is not solved. I guess this 
>>> http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-temporary-arrays-being-created-in-a-function-td14492.html
>>>  might 
>>> be helpful, but I just don't know how to change my code.
>>>
>>>
>> The only places you create temporary arrays according to your profile is 
>> the `sum(A[1:n])` and you just need to loop from 1:n manually instead of 
>> creating an subarray
>>  
>>
>>>
>>>
>>> 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道:
>>>
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <[email protected]> wrote:
>>>>
>>>>> I tried the built-in profiler, and find that the problem lies in lines 
>>>>> I end  with ******, the result is shown below:
>>>>> that proved my guess, how can I pre-allocate these arrays? If I don't 
>>>>> want to divide this code into several parts that calculate these arrays 
>>>>> separately. 
>>>>>
>>>>
>>>> I don't understand what you mean by `divide this code into several 
>>>> parts that calculate these arrays separately`
>>>>  
>>>>
>>>>> | lines | backtrace |
>>>>>
>>>>> |   169 |      9011 |  ***********
>>>>>
>>>>> |   173 |      1552 |
>>>>>
>>>>> |   175 |      2604 |
>>>>>
>>>>> |   179 |      2906 |
>>>>>
>>>>> |   181 |      1541 |
>>>>>
>>>>> |   192 |      4458 |
>>>>>
>>>>> |   211 |     13332 ************|
>>>>>
>>>>> |   214 |      8431 |************
>>>>>
>>>>> |   218 |     15871 |***********
>>>>>
>>>>> |   221 |      2538 |
>>>>>
>>>>>
>>>>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道:
>>>>>>
>>>>>> Have you tried:
>>>>>>
>>>>>> (a) calling @code_typewarn on your function
>>>>>> (b) using the built-in profiler?
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <[email protected]> wrote:
>>>>>>
>>>>>>> First of all, have a look at the result.
>>>>>>>
>>>>>>>
>>>>>>> <https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/AAAAAAAAABE/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589%258720160329210732.png>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> My code calculates the evolution of 1-d 2-electron system in the 
>>>>>>> electric field, some variables are calculated during the evolution.
>>>>>>> According to the result of @time evolution, my code must have a 
>>>>>>> pre-allocation problem. Before you see the long code, i suggest that 
>>>>>>> the 
>>>>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have learnt 
>>>>>>> that I 
>>>>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) and 
>>>>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, but in 
>>>>>>> my 
>>>>>>> situations, I don't know how to pre-allocate these arrays.
>>>>>>>
>>>>>>> Below is the script (precisely, the main function) itself.
>>>>>>>
>>>>>>> function evolution(ϕ::Array{Complex{Float64}, 2},
>>>>>>>                    ele::Array{Float64, 1}, dx::Float64, dt::Float64,
>>>>>>>                    flags::Tuple{Int64, Int64, Int64, Int64})
>>>>>>>     ϕg = copy(ϕ)
>>>>>>>     FFTW.set_num_threads(8)
>>>>>>>     ns = length( ϕ[:, 1] )
>>>>>>>     x = get_x(ns, dx)
>>>>>>>     p = get_p(ns, dx)
>>>>>>>     if flags[4] == 1
>>>>>>>         pp = similar(p)
>>>>>>>         A = -cumsum(ele) * dt
>>>>>>>         A² = A.*A
>>>>>>>         ##### splitting
>>>>>>>         r_sp = 150.0
>>>>>>>         δ_sp = 5.0
>>>>>>>         splitter = Array(Float64, ns, ns)
>>>>>>>     end
>>>>>>>     nt = length( ele )
>>>>>>>
>>>>>>>     # ##### Pre-allocate result and temporary arrays
>>>>>>>     #if flags[1] == 1
>>>>>>>     σ = zeros(Complex128, nt)
>>>>>>>     #end
>>>>>>>     #if flags[2] == 1
>>>>>>>     a = zeros(Float64, nt)
>>>>>>>     #end
>>>>>>>     #if flags[3] == 1
>>>>>>>     r_ionization = 20.0
>>>>>>>     n1 = round(Int, ns/2 - r_ionization/dx)
>>>>>>>     n2 = round(Int, ns/2 + r_ionization/dx)
>>>>>>>     ip = zeros(Float64, nt)
>>>>>>>     #end
>>>>>>>
>>>>>>>     ##### FFT plan
>>>>>>>     p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE )
>>>>>>>
>>>>>>>     prop_x = similar( ϕ )
>>>>>>>     prop_p = similar( prop_x )
>>>>>>>     prop_e = similar( prop_x )
>>>>>>>     # this two versions just cost the same time
>>>>>>>     xplusy = Array(Float64, ns, ns)
>>>>>>>     #xplusy = Array( Float64, ns^2)
>>>>>>>
>>>>>>>     ##### absorb boundary
>>>>>>>     r_a = ns * dx /2 - 50.0
>>>>>>>     δ = 10.0
>>>>>>>     absorb = Array(Float64, ns, ns)
>>>>>>>
>>>>>>>     k0 = 2π / (ns * dx)
>>>>>>>
>>>>>>>     @inbounds for j in 1:ns
>>>>>>>         @inbounds for i in 1:ns
>>>>>>>             prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * dt 
>>>>>>> / 2 )
>>>>>>>             prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt )
>>>>>>>
>>>>>>>             xplusy[i, j] = x[i] + x[j]
>>>>>>>
>>>>>>>             absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - 
>>>>>>> get_out(x[j],
>>>>>>>              r_a, δ))
>>>>>>>         end
>>>>>>>     end
>>>>>>>
>>>>>>>     if flags[2] == 1
>>>>>>>         pvpx = Array(Float64, ns, ns)
>>>>>>>         @inbounds for j in 1:ns
>>>>>>>             @inbounds for i in 1:ns
>>>>>>>                 pvpx[i, j] = get_pvpx(x[i], x[j])
>>>>>>>             end
>>>>>>>         end
>>>>>>>     end
>>>>>>>
>>>>>>>     if flags[4] == 1
>>>>>>>         ϕo = zeros(Complex128, ns, ns)
>>>>>>>         ϕp = zeros(Complex128, ns, ns)
>>>>>>>         @inbounds for  j in 1:ns
>>>>>>>             @inbounds for  i in 1:ns
>>>>>>>                 splitter[i, j] = get_out(x[i], r_sp, δ_sp) * 
>>>>>>> get_out(x[j], r_sp, δ_sp)
>>>>>>>             end
>>>>>>>         end
>>>>>>>     end
>>>>>>>
>>>>>>>     for i in 1:nt
>>>>>>>         for j in eachindex(ϕ)
>>>>>>>             prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) 
>>>>>>> ************************************169
>>>>>>>
>>>>>>>
>>>> You might be hitting a stupid inlining issue here, try adding 
>>>> parenthesis to the multiplication
>>>> (i.e. instead of `a * b * c * d` do `(a * b) * (c * d)`)
>>>>  
>>>>
>>>>>         end
>>>>>>>
>>>>>>>         for j in eachindex(ϕ)
>>>>>>>             ϕ[j] *= prop_x[j] * prop_e[j]
>>>>>>>         end
>>>>>>>         p_fft! * ϕ
>>>>>>>         for j in eachindex(ϕ)
>>>>>>>             ϕ[j] *= prop_p[j]
>>>>>>>         end
>>>>>>>         p_fft! \ ϕ
>>>>>>>         for j in eachindex(ϕ)
>>>>>>>             ϕ[j] *= prop_x[j] * prop_e[j]
>>>>>>>         end
>>>>>>>         ########## autocorrelation function σ(t)
>>>>>>>         if flags[1] == 1
>>>>>>>             for j in eachindex(ϕ)
>>>>>>>                 σ[i] += conj(ϕg[j]) * ϕ[j]
>>>>>>>             end
>>>>>>>         end
>>>>>>>         ########## dipole acceleration a(t)
>>>>>>>         if flags[2] == 1
>>>>>>>             for j in eachindex(ϕ)
>>>>>>>                 a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i])
>>>>>>>             end
>>>>>>>         end
>>>>>>>         ########## ionization probability ip(t)
>>>>>>>         if flags[3] == 1
>>>>>>>             for j1 in n1:n2
>>>>>>>                 for j2 in 1:ns
>>>>>>>                     ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
>>>>>>>                 end
>>>>>>>             end
>>>>>>>             for j1 in [1:n1-1; n2+1:ns]
>>>>>>>                 for j2 in n1:n2
>>>>>>>                     ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2
>>>>>>>                 end
>>>>>>>             end
>>>>>>>         end
>>>>>>>         ########## get momentum
>>>>>>>         if flags[4] == 1
>>>>>>>             for j in eachindex(ϕo)
>>>>>>>                 ϕo[j] = ϕ[j] * splitter[j] * exp( -im * 
>>>>>>> A[i]*xplusy[j] ) **********************************211
>>>>>>>
>>>>>>>
>>>> Same with above
>>>>  
>>>>
>>>>>             end
>>>>>>>             for j in eachindex(p)
>>>>>>>                 pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) + 
>>>>>>> sum( A²[1:nt] ) /2 ******************214
>>>>>>>
>>>>>>>
>>>> write out the sum directly, you can do with a helper function
>>>> Using subarray would also eliminate the data copy but is still 
>>>> suboptimum as it is now.
>>>>  
>>>>
>>>>>             end
>>>>>>>             for j2 in 1:ns
>>>>>>>                 for j1 in 1:ns
>>>>>>>                     ϕo[j1, j2] = ϕo[j1, j2] * exp( -im * (pp[j1] + 
>>>>>>> pp[j2]) * dt)************************218
>>>>>>>
>>>>>>>
>>>> I don't see any obvious problem, (apart from the potential inlining 
>>>> issue as above) but it does look like a keep loop with c function call so 
>>>> it won't be surprising if most of the time is spent here.
>>>>  
>>>>
>>>>>                 end
>>>>>>>             end
>>>>>>>             p_fft! * ϕo
>>>>>>>             for j in eachindex(ϕp)
>>>>>>>                 ϕp[j] += ϕo[j]
>>>>>>>             end
>>>>>>>         end
>>>>>>>
>>>>>>>         ########## absorb boundary
>>>>>>>         if mod(i, 300) == 0
>>>>>>>             for j in eachindex(ϕ)
>>>>>>>                 ϕ[j] *= absorb[j]
>>>>>>>             end
>>>>>>>         end
>>>>>>>
>>>>>>>         if (mod(i, 500) == 0)
>>>>>>>             println("i = $i")
>>>>>>>             flush(STDOUT)
>>>>>>>         end
>>>>>>>     end
>>>>>>>     σ *= dx^2
>>>>>>>     a *= dx^2
>>>>>>>     ip *= dx^2
>>>>>>>
>>>>>>>     save("data/fs.jld", "ϕ", ϕ)
>>>>>>>     if flags[1] == 1
>>>>>>>         save("data/sigma.jld", "σ", σ)
>>>>>>>     end
>>>>>>>     if flags[2] == 1
>>>>>>>         save("data/a.jld", "a", a)
>>>>>>>     end
>>>>>>>     if flags[3] == 1
>>>>>>>         save("data/ip.jld", "ip", ip)
>>>>>>>     end
>>>>>>>     if flags[4] == 1
>>>>>>>         save("data/pf.jld", "ϕp", ϕp)
>>>>>>>     end
>>>>>>>
>>>>>>>     #return σ, a, ip, ϕ
>>>>>>>     nothing
>>>>>>> end
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Reply via email to