I tried @code_warntype, and the result show that the red alert appear only in the io part. Maybe it's not a type-stability issue.
在 2016年3月30日星期三 UTC+8上午3:18:24,Tim Holy写道: > > I haven't look at this myself, but have you tried Stefan's suggestion to > look > at `@code_warntype`? This might not be a preallocation issue, it might be > a > type-stability issue. > > --Tim > > On Tuesday, March 29, 2016 01:56:21 PM Yichao Yu wrote: > > On Tue, Mar 29, 2016 at 1:50 PM, 博陈 <[email protected] <javascript:>> > wrote: > > > Additionally, the allocation problem is not solved. I guess this > > > > http://julia-programming-language.2336112.n4.nabble.com/How-to-avoid-tempo > > > rary-arrays-being-created-in-a-function-td14492.html might be helpful, > but > > > I just don't know how to change my code. > > > > The only places you create temporary arrays according to your profile is > > the `sum(A[1:n])` and you just need to loop from 1:n manually instead of > > creating an subarray > > > > > 在 2016年3月30日星期三 UTC+8上午1:15:07,Yichao Yu写道: > > > > > >> On Tue, Mar 29, 2016 at 12:43 PM, 博陈 <[email protected]> wrote: > > >>> I tried the built-in profiler, and find that the problem lies in > lines I > > >>> end with ******, the result is shown below: > > >>> that proved my guess, how can I pre-allocate these arrays? If I > don't > > >>> want to divide this code into several parts that calculate these > arrays > > >>> separately. > > >> > > >> I don't understand what you mean by `divide this code into several > parts > > >> that calculate these arrays separately` > > >> > > >>> | lines | backtrace | > > >>> | > > >>> | 169 | 9011 | *********** > > >>> | > > >>> | 173 | 1552 | > > >>> | > > >>> | 175 | 2604 | > > >>> | > > >>> | 179 | 2906 | > > >>> | > > >>> | 181 | 1541 | > > >>> | > > >>> | 192 | 4458 | > > >>> | > > >>> | 211 | 13332 ************| > > >>> | > > >>> | 214 | 8431 |************ > > >>> | > > >>> | 218 | 15871 |*********** > > >>> | > > >>> | 221 | 2538 | > > >>> > > >>> 在 2016年3月29日星期二 UTC+8下午9:27:27,Stefan Karpinski写道: > > >>> > > >>>> Have you tried: > > >>>> > > >>>> (a) calling @code_typewarn on your function > > >>>> (b) using the built-in profiler? > > >>>> > > >>>> On Tue, Mar 29, 2016 at 9:23 AM, 博陈 <[email protected]> wrote: > > >>>>> First of all, have a look at the result. > > >>>>> > > >>>>> > > >>>>> < > https://lh3.googleusercontent.com/-anNt-E4P1vM/Vvp-TybegZI/AAAAAAAAAB > > >>>>> > E/ZvDO2xarndMSgKVOXy_hcPd5NTh-7QcEA/s1600/QQ%25E5%259B%25BE%25E7%2589% > > >>>>> 258720160329210732.png> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> My code calculates the evolution of 1-d 2-electron system in the > > >>>>> electric field, some variables are calculated during the > evolution. > > >>>>> According to the result of @time evolution, my code must have a > > >>>>> pre-allocation problem. Before you see the long code, i suggest > that > > >>>>> the > > >>>>> hotspot might be around the Arrays prop_e, \phio, pp. I have > learnt > > >>>>> that I > > >>>>> can use m = Array(Float64, 1) outside a "for" loop and empty!(m) > and > > >>>>> push!(m, new_m) inside the loop to pre-allocate the variable m, > but in > > >>>>> my > > >>>>> situations, I don't know how to pre-allocate these arrays. > > >>>>> > > >>>>> Below is the script (precisely, the main function) itself. > > >>>>> > > >>>>> function evolution(ϕ::Array{Complex{Float64}, 2}, > > >>>>> > > >>>>> ele::Array{Float64, 1}, dx::Float64, > dt::Float64, > > >>>>> flags::Tuple{Int64, Int64, Int64, Int64}) > > >>>>> > > >>>>> ϕg = copy(ϕ) > > >>>>> FFTW.set_num_threads(8) > > >>>>> ns = length( ϕ[:, 1] ) > > >>>>> x = get_x(ns, dx) > > >>>>> p = get_p(ns, dx) > > >>>>> if flags[4] == 1 > > >>>>> > > >>>>> pp = similar(p) > > >>>>> A = -cumsum(ele) * dt > > >>>>> A² = A.*A > > >>>>> ##### splitting > > >>>>> r_sp = 150.0 > > >>>>> δ_sp = 5.0 > > >>>>> splitter = Array(Float64, ns, ns) > > >>>>> > > >>>>> end > > >>>>> nt = length( ele ) > > >>>>> > > >>>>> # ##### Pre-allocate result and temporary arrays > > >>>>> #if flags[1] == 1 > > >>>>> σ = zeros(Complex128, nt) > > >>>>> #end > > >>>>> #if flags[2] == 1 > > >>>>> a = zeros(Float64, nt) > > >>>>> #end > > >>>>> #if flags[3] == 1 > > >>>>> r_ionization = 20.0 > > >>>>> n1 = round(Int, ns/2 - r_ionization/dx) > > >>>>> n2 = round(Int, ns/2 + r_ionization/dx) > > >>>>> ip = zeros(Float64, nt) > > >>>>> #end > > >>>>> > > >>>>> ##### FFT plan > > >>>>> p_fft! = plan_fft!( similar(ϕ), flags=FFTW.MEASURE ) > > >>>>> > > >>>>> prop_x = similar( ϕ ) > > >>>>> prop_p = similar( prop_x ) > > >>>>> prop_e = similar( prop_x ) > > >>>>> # this two versions just cost the same time > > >>>>> xplusy = Array(Float64, ns, ns) > > >>>>> #xplusy = Array( Float64, ns^2) > > >>>>> > > >>>>> ##### absorb boundary > > >>>>> r_a = ns * dx /2 - 50.0 > > >>>>> δ = 10.0 > > >>>>> absorb = Array(Float64, ns, ns) > > >>>>> > > >>>>> k0 = 2π / (ns * dx) > > >>>>> > > >>>>> @inbounds for j in 1:ns > > >>>>> > > >>>>> @inbounds for i in 1:ns > > >>>>> > > >>>>> prop_x[i, j] = exp( -im * get_potential(x[i], x[j]) * > dt / > > >>>>> > > >>>>> 2 ) > > >>>>> > > >>>>> prop_p[i, j] = exp( -im * (p[i]^2 + p[j]^2)/2 * dt ) > > >>>>> > > >>>>> xplusy[i, j] = x[i] + x[j] > > >>>>> > > >>>>> absorb[i, j] = (1.0 - get_out(x[i], r_a, δ ))* (1.0 - > > >>>>> > > >>>>> get_out(x[j], > > >>>>> > > >>>>> r_a, δ)) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> if flags[2] == 1 > > >>>>> > > >>>>> pvpx = Array(Float64, ns, ns) > > >>>>> @inbounds for j in 1:ns > > >>>>> > > >>>>> @inbounds for i in 1:ns > > >>>>> > > >>>>> pvpx[i, j] = get_pvpx(x[i], x[j]) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> if flags[4] == 1 > > >>>>> > > >>>>> ϕo = zeros(Complex128, ns, ns) > > >>>>> ϕp = zeros(Complex128, ns, ns) > > >>>>> @inbounds for j in 1:ns > > >>>>> > > >>>>> @inbounds for i in 1:ns > > >>>>> > > >>>>> splitter[i, j] = get_out(x[i], r_sp, δ_sp) * > > >>>>> > > >>>>> get_out(x[j], r_sp, δ_sp) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> for i in 1:nt > > >>>>> > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> prop_e[j] = exp( -im * ele[i] * xplusy[j] * dt/2.0) > > >>>>> > > >>>>> ************************************169 > > >> > > >> You might be hitting a stupid inlining issue here, try adding > parenthesis > > >> to the multiplication > > >> (i.e. instead of `a * b * c * d` do `(a * b) * (c * d)`) > > >> > > >>> end > > >>> > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> ϕ[j] *= prop_x[j] * prop_e[j] > > >>>>> > > >>>>> end > > >>>>> p_fft! * ϕ > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> ϕ[j] *= prop_p[j] > > >>>>> > > >>>>> end > > >>>>> p_fft! \ ϕ > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> ϕ[j] *= prop_x[j] * prop_e[j] > > >>>>> > > >>>>> end > > >>>>> ########## autocorrelation function σ(t) > > >>>>> if flags[1] == 1 > > >>>>> > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> σ[i] += conj(ϕg[j]) * ϕ[j] > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> ########## dipole acceleration a(t) > > >>>>> if flags[2] == 1 > > >>>>> > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> a[i] += abs(ϕ[j])^2 * (pvpx[j] + 2ele[i]) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> ########## ionization probability ip(t) > > >>>>> if flags[3] == 1 > > >>>>> > > >>>>> for j1 in n1:n2 > > >>>>> > > >>>>> for j2 in 1:ns > > >>>>> > > >>>>> ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2 > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> for j1 in [1:n1-1; n2+1:ns] > > >>>>> > > >>>>> for j2 in n1:n2 > > >>>>> > > >>>>> ip[i] += abs( ϕ[j2+ns*(j1-1)] )^2 > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> ########## get momentum > > >>>>> if flags[4] == 1 > > >>>>> > > >>>>> for j in eachindex(ϕo) > > >>>>> > > >>>>> ϕo[j] = ϕ[j] * splitter[j] * exp( -im * > A[i]*xplusy[j] > > >>>>> > > >>>>> ) **********************************211 > > >> > > >> Same with above > > >> > > >>> end > > >>> > > >>>>> for j in eachindex(p) > > >>>>> > > >>>>> pp[j] = p[j]^2 /2 * (nt-i) - p[j] *sum( A[i:nt] ) > + > > >>>>> > > >>>>> sum( A²[1:nt] ) /2 ******************214 > > >> > > >> write out the sum directly, you can do with a helper function > > >> Using subarray would also eliminate the data copy but is still > suboptimum > > >> as it is now. > > >> > > >>> end > > >>> > > >>>>> for j2 in 1:ns > > >>>>> > > >>>>> for j1 in 1:ns > > >>>>> > > >>>>> ϕo[j1, j2] = ϕo[j1, j2] * exp( -im * (pp[j1] + > > >>>>> > > >>>>> pp[j2]) * dt)************************218 > > >> > > >> I don't see any obvious problem, (apart from the potential inlining > issue > > >> as above) but it does look like a keep loop with c function call so > it > > >> won't be surprising if most of the time is spent here. > > >> > > >>> end > > >>>>> > > >>>>> end > > >>>>> p_fft! * ϕo > > >>>>> for j in eachindex(ϕp) > > >>>>> > > >>>>> ϕp[j] += ϕo[j] > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> ########## absorb boundary > > >>>>> if mod(i, 300) == 0 > > >>>>> > > >>>>> for j in eachindex(ϕ) > > >>>>> > > >>>>> ϕ[j] *= absorb[j] > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> > > >>>>> if (mod(i, 500) == 0) > > >>>>> > > >>>>> println("i = $i") > > >>>>> flush(STDOUT) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> end > > >>>>> σ *= dx^2 > > >>>>> a *= dx^2 > > >>>>> ip *= dx^2 > > >>>>> > > >>>>> save("data/fs.jld", "ϕ", ϕ) > > >>>>> if flags[1] == 1 > > >>>>> > > >>>>> save("data/sigma.jld", "σ", σ) > > >>>>> > > >>>>> end > > >>>>> if flags[2] == 1 > > >>>>> > > >>>>> save("data/a.jld", "a", a) > > >>>>> > > >>>>> end > > >>>>> if flags[3] == 1 > > >>>>> > > >>>>> save("data/ip.jld", "ip", ip) > > >>>>> > > >>>>> end > > >>>>> if flags[4] == 1 > > >>>>> > > >>>>> save("data/pf.jld", "ϕp", ϕp) > > >>>>> > > >>>>> end > > >>>>> > > >>>>> #return σ, a, ip, ϕ > > >>>>> nothing > > >>>>> > > >>>>> end > >
