Great that you were able to learn something! Next time you should probably check that your code produces correct answers before worrying about performance :P
Attachments are not frowned upon, but personally I like gist.github.com (or similar services) much better because they are easier to use on a phone or tablet and you get syntax highligthing and can start reading code after only one click. If the main email contains a clear description of what the problem is, you will usually get multiple good responses. Why isn't that 500*500 iterations? Because I somehow thought one of the `= 1:npop` was `= 1:n1`. I don't know why I would have thought that. It is definitely incorrect. How do you know a temporary is being created? > Currently in Julia x += y is just a fancy way of writing x = x + y Julia does (currently) not care if there are other active references to the array pointed to by x in other scopes, so it has to allocate a new array to store the sum of x and y, and then point the local x variable to the new array. See #7052 <https://github.com/JuliaLang/julia/issues/7052> for how the discussion about making += do the update in place failed. kl. 21:01:10 UTC+2 lørdag 26. juli 2014 skrev Ross Boylan følgende: > > On Sat, 2014-07-26 at 01:38 -0700, Ivar Nesje wrote: > > It is hard to test my assumptions when you don't post a complete > > runnable example, and profiling is a very useful tool in situations > > like this. It is also a little unclear what you mean by compile time, > > as Julia compiles the code while executing (JIT compiling). > Your assumptions all seem to have been correct. Sorry I didn't include > the code; I wasn't sure if attachments were OK and I hadn't reduced it > to a minimal example. Partially corrected version attached. > > I should have said execute > include("/home/ross/UCSF/HIVRace/simpleMatch.jl"), > rather than "compile". > > Profiling this indicates almost all the time going to the > affinity += beta[x[i], x[j]] > inside the loop you highlighted (next quoted section). At least I think > it does; the profile says > no file; anonymous; line: 27 > and that is line 27 of simpleMatch.jl. I'm not sure why the file isn't > identified. (Full profile at the bottom). > > > > > After a simple reading of the posted code the following line stood > > out . > > > > > > for i = 1:npop, j=1:npop > > affinity += beta[x[i], x[j]] > > end > > > > This loop runs for 250 * 500 iterations and allocates a 500 * 500 > Why isn't that 500*500 iterations? BTW I lowered npop to 300 for the > profile. > > array on each iteration (because for some reason Julia has decided > > that x += y is not a special in place increment operation but just a > > short alias for x = x + y). It seems to me that this could be written > > in a much more efficient way without creating temporary arrays. > > How do you know a temporary is being created? > > Oh.. that code is completely wrong. It should be > affinity[i, j] += beta[x[i], x[j]] > > So it was 500^3 operations. With that fix, everything is back to > "instantaneous". And the simulation results are looking much more > sensible. > > > > > > Another thing that might cause performance grief is that you have > > > > > > cut :: Vector{FloatingPoint} > > # Instead of > > cut::Vector{Float64} > > > Will do. > > FloatingPoint is a abstract type that lets you store different > > floating point types in the array (like Float16, Float32, Float64, > > Float128 and BigFloat). Unfortuenatly that means that for all > > operations on the array Julia needs to check what type the element has > > and find the right method for the operation. That is much slower than > > when it compile time knows the type to be Float64 > > > > > > The last line: > > > > > > fill!(actors, Actor()) > > > > > > > > Seems like a bug, because the actors array will be filled with > > pointers to the same Actor object, and if you change one of them, all > > will be changed. > > Thanks for helping me learn. That was another big problem. > > Here's the profile of the defective attached code: > > WARNING: The profile data buffer is full; profiling probably terminated > before your program finished. To profile for longer runs, call > Profile.init > with a larger buffer and/or larger delay. > 1 ...src/unix/linux-core.c; uv__hrtime; line: 333 > 13258 task.jl; anonymous; line: 96 > 13258 REPL.jl; eval_user_input; line: 54 > 13258 profile.jl; anonymous; line: 14 > 13258 ./loading.jl; include_from_node1; line: 128 > 13258 ./boot.jl; include; line: 245 > 5 ./inference.jl; typeinf_ext; line: 1214 # I assume this is > the compiler > 1 ./inference.jl; typeinf; line: 1296 > 1 ./inference.jl; typeinf; line: 1402 > 1 ./inference.jl; abstract_interpret; line: 1088 > 1 ./inference.jl; abstract_eval; line: 933 > 1 ./inference.jl; abstract_eval_call; line: 881 > 1 ./inference.jl; typeinf; line: 1509 > 1 ./inference.jl; stupdate; line: 1167 > 1 ./inference.jl; typeinf; line: 1525 > 1 ./inference.jl; inlining_pass; line: 2522 > 1 ./inference.jl; inlining_pass; line: 2559 > 1 ./inference.jl; inlining_pass; line: 2623 > 1 ./inference.jl; inlineable; line: 2354 > 1 ./inference.jl; effect_free; line: 1976 > 1 ./inference.jl; effect_free; line: 1976 > 1 ./inference.jl; effect_free; line: 1927 > 1 ./inference.jl; is_known_call_p; line: 2763 > 1 ./inference.jl; typeinf; line: 1529 > 1 ./inference.jl; tuple_elim_pass; line: 2937 > 1 ./inference.jl; remove_redundant_temp_vars; line: 2790 > 1 ./reduce.jl; mapreduce; line: 172 > 1 ./reduce.jl; _mapreduce; line: 162 > 13235 no file; anonymous; line: 27 # I assume this is in my > program file > 1 ...buv/src/unix/core.c; uv_hrtime; line: 86 > 2450 array.jl; +; line: 770 > 199 array.jl; .+; line: 756 > 2250 array.jl; .+; line: 758 > 1 array.jl; .+; line: 760 > 2 random.jl; randn; line: 260 > 1 random.jl; randn!; line: 257 > 1 dSFMT.jl; randmtzig_randn; line: 598 > > Ross > > > > Regards Ivar > > > > > > kl. 09:01:12 UTC+2 lørdag 26. juli 2014 skrev Ross Boylan følgende: > > In julia 0.3, my module includes > > const npop = 500 #population size > > This takes a very noticeable (1 minute?) amount of time to > > compile. > > > > If I set const npop=50 instead, compilation is instant (i.e., > > no > > perceptible delay). > > > > The module does create some Arrays that have npop as one of > > their > > dimensions. Some are 500 x 40. > > > > Is this behavior expected? > > > > I might believe different code is generated when npop is very > > small, but > > for 50 is already well over what I'd call small. > > > > Ross Boylan > > > > P.S. the module level variables involving npop are (with > > nstep=40, > > n1=250) > > > > counts = zeros(Int, 3, nstep) > > > > affinity = randn(npop, npop) # will be modified later > > > > x = [ i<=n1 ? 1 : 2 for i in 1:npop] > > > > beta = [0.5 -0.5; -0.5 0.5] > > > > for i = 1:npop, j=1:npop > > affinity += beta[x[i], x[j]] > > end > > > > type Actor > > cut :: Vector{FloatingPoint} # track cut point for each > > time > > matches :: Matrix{Int} # 2xtime successful matches with > > each type > > end > > > > Actor() = Actor(2ones(FloatingPoint, nstep), zeros(Int, 2, > > nstep)) > > > > actors = Array(Actor, npop) > > fill!(actors, Actor()) > > > > I also specify the random seed, and so in principle everything > > could be > > set up at compile time. > > > >
