On Sat, 2014-07-26 at 01:38 -0700, Ivar Nesje wrote:
> It is hard to test my assumptions when you don't post a complete
> runnable example, and profiling is a very useful tool in situations
> like this. It is also a little unclear what you mean by compile time,
> as Julia compiles the code while executing (JIT compiling).
Your assumptions all seem to have been correct. Sorry I didn't include
the code; I wasn't sure if attachments were OK and I hadn't reduced it
to a minimal example. Partially corrected version attached.
I should have said execute
include("/home/ross/UCSF/HIVRace/simpleMatch.jl"),
rather than "compile".
Profiling this indicates almost all the time going to the
affinity += beta[x[i], x[j]]
inside the loop you highlighted (next quoted section). At least I think
it does; the profile says
no file; anonymous; line: 27
and that is line 27 of simpleMatch.jl. I'm not sure why the file isn't
identified. (Full profile at the bottom).
>
> After a simple reading of the posted code the following line stood
> out .
>
>
> for i = 1:npop, j=1:npop
> affinity += beta[x[i], x[j]]
> end
>
> This loop runs for 250 * 500 iterations and allocates a 500 * 500
Why isn't that 500*500 iterations? BTW I lowered npop to 300 for the
profile.
> array on each iteration (because for some reason Julia has decided
> that x += y is not a special in place increment operation but just a
> short alias for x = x + y). It seems to me that this could be written
> in a much more efficient way without creating temporary arrays.
How do you know a temporary is being created?
Oh.. that code is completely wrong. It should be
affinity[i, j] += beta[x[i], x[j]]
So it was 500^3 operations. With that fix, everything is back to
"instantaneous". And the simulation results are looking much more
sensible.
>
> Another thing that might cause performance grief is that you have
>
>
> cut :: Vector{FloatingPoint}
> # Instead of
> cut::Vector{Float64}
>
Will do.
> FloatingPoint is a abstract type that lets you store different
> floating point types in the array (like Float16, Float32, Float64,
> Float128 and BigFloat). Unfortuenatly that means that for all
> operations on the array Julia needs to check what type the element has
> and find the right method for the operation. That is much slower than
> when it compile time knows the type to be Float64
>
>
> The last line:
>
>
> fill!(actors, Actor())
>
>
>
> Seems like a bug, because the actors array will be filled with
> pointers to the same Actor object, and if you change one of them, all
> will be changed.
Thanks for helping me learn. That was another big problem.
Here's the profile of the defective attached code:
WARNING: The profile data buffer is full; profiling probably terminated
before your program finished. To profile for longer runs, call
Profile.init
with a larger buffer and/or larger delay.
1 ...src/unix/linux-core.c; uv__hrtime; line: 333
13258 task.jl; anonymous; line: 96
13258 REPL.jl; eval_user_input; line: 54
13258 profile.jl; anonymous; line: 14
13258 ./loading.jl; include_from_node1; line: 128
13258 ./boot.jl; include; line: 245
5 ./inference.jl; typeinf_ext; line: 1214 # I assume this is
the compiler
1 ./inference.jl; typeinf; line: 1296
1 ./inference.jl; typeinf; line: 1402
1 ./inference.jl; abstract_interpret; line: 1088
1 ./inference.jl; abstract_eval; line: 933
1 ./inference.jl; abstract_eval_call; line: 881
1 ./inference.jl; typeinf; line: 1509
1 ./inference.jl; stupdate; line: 1167
1 ./inference.jl; typeinf; line: 1525
1 ./inference.jl; inlining_pass; line: 2522
1 ./inference.jl; inlining_pass; line: 2559
1 ./inference.jl; inlining_pass; line: 2623
1 ./inference.jl; inlineable; line: 2354
1 ./inference.jl; effect_free; line: 1976
1 ./inference.jl; effect_free; line: 1976
1 ./inference.jl; effect_free; line: 1927
1 ./inference.jl; is_known_call_p; line: 2763
1 ./inference.jl; typeinf; line: 1529
1 ./inference.jl; tuple_elim_pass; line: 2937
1 ./inference.jl; remove_redundant_temp_vars; line: 2790
1 ./reduce.jl; mapreduce; line: 172
1 ./reduce.jl; _mapreduce; line: 162
13235 no file; anonymous; line: 27 # I assume this is in my
program file
1 ...buv/src/unix/core.c; uv_hrtime; line: 86
2450 array.jl; +; line: 770
199 array.jl; .+; line: 756
2250 array.jl; .+; line: 758
1 array.jl; .+; line: 760
2 random.jl; randn; line: 260
1 random.jl; randn!; line: 257
1 dSFMT.jl; randmtzig_randn; line: 598
Ross
>
> Regards Ivar
>
>
> kl. 09:01:12 UTC+2 lørdag 26. juli 2014 skrev Ross Boylan følgende:
> In julia 0.3, my module includes
> const npop = 500 #population size
> This takes a very noticeable (1 minute?) amount of time to
> compile.
>
> If I set const npop=50 instead, compilation is instant (i.e.,
> no
> perceptible delay).
>
> The module does create some Arrays that have npop as one of
> their
> dimensions. Some are 500 x 40.
>
> Is this behavior expected?
>
> I might believe different code is generated when npop is very
> small, but
> for 50 is already well over what I'd call small.
>
> Ross Boylan
>
> P.S. the module level variables involving npop are (with
> nstep=40,
> n1=250)
>
> counts = zeros(Int, 3, nstep)
>
> affinity = randn(npop, npop) # will be modified later
>
> x = [ i<=n1 ? 1 : 2 for i in 1:npop]
>
> beta = [0.5 -0.5; -0.5 0.5]
>
> for i = 1:npop, j=1:npop
> affinity += beta[x[i], x[j]]
> end
>
> type Actor
> cut :: Vector{FloatingPoint} # track cut point for each
> time
> matches :: Matrix{Int} # 2xtime successful matches with
> each type
> end
>
> Actor() = Actor(2ones(FloatingPoint, nstep), zeros(Int, 2,
> nstep))
>
> actors = Array(Actor, npop)
> fill!(actors, Actor())
>
> I also specify the random seed, and so in principle everything
> could be
> set up at compile time.
>
module Sim
const npop = 300 #population size
const nstep = 40 # number of time steps
const n1 = floor(npop*0.5) # 1:n1 have type 1, rest are type 2
const delta_cut = 0.1 # amount to move cut
const ncandidates = 5 # number of partners to try
srand(1234) # reproducible random numbers
# counts records successful matches for all actors at a given time
# the count is of the matches, not the view from individuals
# first index 1 = 1-1 match; 2 = 2-2 match; 3 = 1-2 or 2-1 match
counts = zeros(Int, 3, nstep)
# affinity[i, j] is how much i likes j as a partner
# we begin with the random component
affinity = randn(npop, npop) # will be modified later
# covariates. just one for type 1 or 2
x = [ i<=n1 ? 1 : 2 for i in 1:npop]
#beta[i, j] what type i thinks of type j
beta = [0.5 -0.5; -0.5 0.5]
for i = 1:npop, j=1:npop
affinity += beta[x[i], x[j]]
end
# affinity is now set and assumed constant across time
type Actor
cut :: Vector{FloatingPoint} # track cut point for each time
matches :: Matrix{Int} # 2xtime successful matches with each type
end
Actor() = Actor(ones(FloatingPoint, nstep), zeros(Int, 2, nstep))
actors = [Actor() for i in 1:npop]
# return tuple of boolean that is true if enough data, estimated success rate
# as of time now
function success_rate(a::Actor, now::Int)
const max_window = 15
const min_window = 4
if now == 1
return false, 0.0
end
cut = a.cut[now]
lowest = max(1, now-max_window)
first_same = now
for first_same in now-1:-1:lowest
if cut != a.cut[first_same]
first_same += 1
break
end
end
if first_same+min_window-1 > now
return false, 0.0
end
# we have enough for an average
# might weight this by time at some point
return true, mean(sum(a.matches[:, first_same:now], 1))
end
# set cut for now+1
function adjust_cut!(a::Actor, now::Int)
const target_rate = 1.0 # desired matches per tick
enough, rate = success_rate(a, now)
cutnow = a.cut[now]
if ! enough
a.cut[now+1] = cutnow
else
if rate > target_rate*1.15
a.cut[now+1] = cutnow+delta_cut
elseif rate < target_rate*.85
a.cut[now+1] = cutnow-delta_cut
else
a.cut[now+1] = cutnow
end
end
end
# return true if j is an acceptable match for i
# references global affinity and actors
function accept(i::Int, j::Int, now::Int)
global affinity, actors
return affinity[i, j] >= actors[i].cut[now]
end
# record a successful match
function match!(i::Int, j::Int, now::Int)
global actors, counts
ti = x[i]
tj = x[j]
if ti == tj
counts[ti, now] += 1
else
counts[3, now] += 1
end
actors[i].matches[tj, now] += 1
actors[j].matches[ti, now] += 1
end
function go()
for time = 1:nstep
for i = 1:npop
js = unique(rand(1:npop, ncandidates))
for j in js
if i != j && accept(i, j, time) && accept(j, i, time)
match!(i, j, time)
end
end
end
# must adjust cuts after all matches done
if time < nstep
for i = 1:npop
adjust_cut!(actors[i], time)
end
end
end # of time loop
counts
end
# analysis
# If we did a census and asked each person to list their
# partners these are the totals we would get per group
# indices are source group, dest group, time
function count_partners(counts::Matrix{Int})
dims = size(counts)
function counter(i::Int, j::Int, t::Int)
if i == j
2*counts[i, t]
else
counts[3, t]
end
end
[counter(i, j, t) for i=1:2, j=1:2, t=1:nstep]
end
# get self share from the ties NOT the partner counts
function share(counts::Matrix{Int})
cs = count_partners(counts)
function inner(i::Int, t::Int)
cs[i, i, t]/sum(cs[i, :, t])
end
[inner(i, t) for i=1:2, t=1:nstep]
end
end