I've been trying to optimize a sampling routine and have found the
flat-formatted profiler output very useful. But I'm finding it a little
tricky to interpret sometimes. I have a slightly tweaked version that
reports percentage of samples (instead of the sample count), and also shows
the original line from the file (ideally, an HTML or GUI view would be
best...) Here are lines with >1% incidence. (Also posted
at https://gist.github.com/brendano/8202837 ). I've bolded a few lines:
13150 samples ~ 0.4% se, though more in reality
File Function
Line
50.3 % treetm.jl cgsIterPath 383 qnewLL,pnewLL =
proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate)
44.1 % treetm.jl cgsIterPath 384 qoldLL,poldLL =
proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate)
4.1 % treetm.jl getindex 79 getindex(c::ClassCountTable, k) =
c.counts[k]
2.1 % treetm.jl incrementFullpath! 122 x = b ? x.right : x.left
22.0 % treetm.jl proposePath! 337 w0 = (n0.counts[wordID] +
betaHere/V - on0) / (n0.counts.total + betaHere - int(on_cur))
18.9 % treetm.jl proposePath! 338 w1 = (n1.counts[wordID] +
betaHere/V - on1) / (n1.counts.total + betaHere - int(on_cur))
5.2 % treetm.jl proposePath! 339 p0 =
(cur_docnode.left.count + mm.gammaConc/2 - on0)
6.5 % treetm.jl proposePath! 340 p1 =
(cur_docnode.right.count+ mm.gammaConc/2 - on1)
* 8.1 % treetm.jl proposePath! 341 q = p1*w1 / (p1*w1 +
p0*w0)*
2.3 % treetm.jl proposePath! 344 dec = bool(rand() < q)
1.1 % treetm.jl proposePath! 345 path[curdepth] = dec
* 8.5 % treetm.jl proposePath! 350 proposalLL += dec ?
log(q) : log(1-q)*
11.4 % treetm.jl proposePath! 351 unnormLL += dec ?
log(p1/(p0+p1)) : log(p0/(p0+p1)) # only the prior branch prob
4.2 % treetm.jl proposePath! 367 wp = (nn.counts[wordID] +
mm.betaConc/V +rem) / (nn.counts.total + mm.betaConc +rem)
* 2.0 % math.jl log 290 *
100.0 % no file anonymous 14
I'm trying to understand what makes up the large runtime of line 350. I
would think it is slow because of the logarithm. However, math.jl's log()
is only 2%, which is a lot less than 8.5%. Can the branching decision for
'dec', and the += operation, really be responsible for the rest of the time?
Or, is it wrong to interpret this so directly? I'm now wondering if there
was some sort of compiler optimization, so that q's calculation (line 341)
is actually computed later around the time of line 350, at least as how the
profiler reports it?
Unfortunately, since proposePath!() is a fairly long function, I can't make
heads or tails of code_llvm(). I posted more at the github gist if anyone
is curious.
Another thing I'm curious about is whether object attribute lookups are
fast: in particular, if they might be a cause of slowdown for lines 337 and
338. This was my experience with Cython (that eliminating slow python-land
lookups gave a lot of speedup), but maybe it's the wrong way to think about
Julia. Are abstract types more problematic than concrete types, for
attribute lookup? (I think I saw an effect there, but don't have a
detailed example at the moment. The current implementation the above
profiling output stems from uses a concrete node type.)
Brendan