Looking at this now, what are the types of the variables on the bolded lines? 
If they’re specific real-valued types, I’m surprised they’re so slow.

Right now it’s not possible to have an abstract type with fields, so field 
lookup should be pretty consistently fast.

 — John

On Dec 31, 2013, at 6:20 PM, Brendan O'Connor <[email protected]> wrote:

> I've been trying to optimize a sampling routine and have found the 
> flat-formatted profiler output very useful.  But I'm finding it a little 
> tricky to interpret sometimes.  I have a slightly tweaked version that 
> reports percentage of samples (instead of the sample count), and also shows 
> the original line from the file (ideally, an HTML or GUI view would be 
> best...)  Here are lines with >1% incidence.  (Also posted at 
> https://gist.github.com/brendano/8202837 ).  I've bolded a few lines:
> 
> 13150 samples ~ 0.4% se, though more in reality
>        File                                            Function            
> Line
> 50.3 % treetm.jl cgsIterPath          383             qnewLL,pnewLL = 
> proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate)
> 44.1 % treetm.jl cgsIterPath          384             qoldLL,poldLL = 
> proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate)
>  4.1 % treetm.jl getindex              79 getindex(c::ClassCountTable, k) = 
> c.counts[k]
>  2.1 % treetm.jl incrementFullpath!   122         x = b ? x.right : x.left
> 22.0 % treetm.jl proposePath!         337         w0 = (n0.counts[wordID] + 
> betaHere/V - on0) / (n0.counts.total + betaHere - int(on_cur))
> 18.9 % treetm.jl proposePath!         338         w1 = (n1.counts[wordID] + 
> betaHere/V - on1) / (n1.counts.total + betaHere - int(on_cur))
>  5.2 % treetm.jl proposePath!         339         p0 = 
> (cur_docnode.left.count + mm.gammaConc/2 - on0)
>  6.5 % treetm.jl proposePath!         340         p1 = 
> (cur_docnode.right.count+ mm.gammaConc/2 - on1)
>  8.1 % treetm.jl proposePath!         341         q = p1*w1 / (p1*w1 + p0*w0)
>  2.3 % treetm.jl proposePath!         344             dec = bool(rand() < q)
>  1.1 % treetm.jl proposePath!         345             path[curdepth] = dec
>  8.5 % treetm.jl proposePath!         350         proposalLL += dec ? log(q) 
> : log(1-q)
> 11.4 % treetm.jl proposePath!         351         unnormLL += dec ? 
> log(p1/(p0+p1)) : log(p0/(p0+p1))   # only the prior branch prob
>  4.2 % treetm.jl proposePath!         367     wp = (nn.counts[wordID] + 
> mm.betaConc/V +rem) / (nn.counts.total + mm.betaConc +rem)
>  2.0 % math.jl   log                  290 
> 100.0 % no file  anonymous             14 
> 
> I'm trying to understand what makes up the large runtime of line 350.  I 
> would think it is slow because of the logarithm.  However, math.jl's log() is 
> only 2%, which is a lot less than 8.5%.  Can the branching decision for 
> 'dec', and the += operation, really be responsible for the rest of the time?
> 
> Or, is it wrong to interpret this so directly?  I'm now wondering if there 
> was some sort of compiler optimization, so that q's calculation (line 341) is 
> actually computed later around the time of line 350, at least as how the 
> profiler reports it?
> 
> Unfortunately, since proposePath!() is a fairly long function, I can't make 
> heads or tails of code_llvm().  I posted more at the github gist if anyone is 
> curious.
> 
> Another thing I'm curious about is whether object attribute lookups are fast: 
> in particular, if they might be a cause of slowdown for lines 337 and 338.  
> This was my experience with Cython (that eliminating slow python-land lookups 
> gave a lot of speedup), but maybe it's the wrong way to think about Julia.  
> Are abstract types more problematic than concrete types, for attribute 
> lookup?  (I think I saw an effect there, but don't have a detailed example at 
> the moment.  The current implementation the above profiling output stems from 
> uses a concrete node type.)
> 
> Brendan
> 

Reply via email to