I've been trying to optimize a sampling routine and have found the 
flat-formatted profiler output very useful.  But I'm finding it a little 
tricky to interpret sometimes.  I have a slightly tweaked version that 
reports percentage of samples (instead of the sample count), and also shows 
the original line from the file (ideally, an HTML or GUI view would be 
best...)  Here are lines with >1% incidence.  (Also posted 
at https://gist.github.com/brendano/8202837 ).  I've bolded a few lines:

13150 samples ~ 0.4% se, though more in reality
       File                                            Function           
 Line
50.3 % treetm.jl cgsIterPath          383             qnewLL,pnewLL = 
proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate)
44.1 % treetm.jl cgsIterPath          384             qoldLL,poldLL = 
proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate)
 4.1 % treetm.jl getindex              79 getindex(c::ClassCountTable, k) = 
c.counts[k]
 2.1 % treetm.jl incrementFullpath!   122         x = b ? x.right : x.left
22.0 % treetm.jl proposePath!         337         w0 = (n0.counts[wordID] + 
betaHere/V - on0) / (n0.counts.total + betaHere - int(on_cur))
18.9 % treetm.jl proposePath!         338         w1 = (n1.counts[wordID] + 
betaHere/V - on1) / (n1.counts.total + betaHere - int(on_cur))
 5.2 % treetm.jl proposePath!         339         p0 = 
(cur_docnode.left.count + mm.gammaConc/2 - on0)
 6.5 % treetm.jl proposePath!         340         p1 = 
(cur_docnode.right.count+ mm.gammaConc/2 - on1)
* 8.1 % treetm.jl proposePath!         341         q = p1*w1 / (p1*w1 + 
p0*w0)*
 2.3 % treetm.jl proposePath!         344             dec = bool(rand() < q)
 1.1 % treetm.jl proposePath!         345             path[curdepth] = dec
* 8.5 % treetm.jl proposePath!         350         proposalLL += dec ? 
log(q) : log(1-q)*
11.4 % treetm.jl proposePath!         351         unnormLL += dec ? 
log(p1/(p0+p1)) : log(p0/(p0+p1))   # only the prior branch prob
 4.2 % treetm.jl proposePath!         367     wp = (nn.counts[wordID] + 
mm.betaConc/V +rem) / (nn.counts.total + mm.betaConc +rem)
* 2.0 % math.jl   log                  290 *
100.0 % no file  anonymous             14 

I'm trying to understand what makes up the large runtime of line 350.  I 
would think it is slow because of the logarithm.  However, math.jl's log() 
is only 2%, which is a lot less than 8.5%.  Can the branching decision for 
'dec', and the += operation, really be responsible for the rest of the time?

Or, is it wrong to interpret this so directly?  I'm now wondering if there 
was some sort of compiler optimization, so that q's calculation (line 341) 
is actually computed later around the time of line 350, at least as how the 
profiler reports it?

Unfortunately, since proposePath!() is a fairly long function, I can't make 
heads or tails of code_llvm().  I posted more at the github gist if anyone 
is curious.

Another thing I'm curious about is whether object attribute lookups are 
fast: in particular, if they might be a cause of slowdown for lines 337 and 
338.  This was my experience with Cython (that eliminating slow python-land 
lookups gave a lot of speedup), but maybe it's the wrong way to think about 
Julia.  Are abstract types more problematic than concrete types, for 
attribute lookup?  (I think I saw an effect there, but don't have a 
detailed example at the moment.  The current implementation the above 
profiling output stems from uses a concrete node type.)

Brendan

Reply via email to