Looking at this now, what are the types of the variables on the bolded lines? If they’re specific real-valued types, I’m surprised they’re so slow.
Right now it’s not possible to have an abstract type with fields, so field lookup should be pretty consistently fast. — John On Dec 31, 2013, at 6:20 PM, Brendan O'Connor <[email protected]> wrote: > I've been trying to optimize a sampling routine and have found the > flat-formatted profiler output very useful. But I'm finding it a little > tricky to interpret sometimes. I have a slightly tweaked version that > reports percentage of samples (instead of the sample count), and also shows > the original line from the file (ideally, an HTML or GUI view would be > best...) Here are lines with >1% incidence. (Also posted at > https://gist.github.com/brendano/8202837 ). I've bolded a few lines: > > 13150 samples ~ 0.4% se, though more in reality > File Function > Line > 50.3 % treetm.jl cgsIterPath 383 qnewLL,pnewLL = > proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate) > 44.1 % treetm.jl cgsIterPath 384 qoldLL,poldLL = > proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate) > 4.1 % treetm.jl getindex 79 getindex(c::ClassCountTable, k) = > c.counts[k] > 2.1 % treetm.jl incrementFullpath! 122 x = b ? x.right : x.left > 22.0 % treetm.jl proposePath! 337 w0 = (n0.counts[wordID] + > betaHere/V - on0) / (n0.counts.total + betaHere - int(on_cur)) > 18.9 % treetm.jl proposePath! 338 w1 = (n1.counts[wordID] + > betaHere/V - on1) / (n1.counts.total + betaHere - int(on_cur)) > 5.2 % treetm.jl proposePath! 339 p0 = > (cur_docnode.left.count + mm.gammaConc/2 - on0) > 6.5 % treetm.jl proposePath! 340 p1 = > (cur_docnode.right.count+ mm.gammaConc/2 - on1) > 8.1 % treetm.jl proposePath! 341 q = p1*w1 / (p1*w1 + p0*w0) > 2.3 % treetm.jl proposePath! 344 dec = bool(rand() < q) > 1.1 % treetm.jl proposePath! 345 path[curdepth] = dec > 8.5 % treetm.jl proposePath! 350 proposalLL += dec ? log(q) > : log(1-q) > 11.4 % treetm.jl proposePath! 351 unnormLL += dec ? > log(p1/(p0+p1)) : log(p0/(p0+p1)) # only the prior branch prob > 4.2 % treetm.jl proposePath! 367 wp = (nn.counts[wordID] + > mm.betaConc/V +rem) / (nn.counts.total + mm.betaConc +rem) > 2.0 % math.jl log 290 > 100.0 % no file anonymous 14 > > I'm trying to understand what makes up the large runtime of line 350. I > would think it is slow because of the logarithm. However, math.jl's log() is > only 2%, which is a lot less than 8.5%. Can the branching decision for > 'dec', and the += operation, really be responsible for the rest of the time? > > Or, is it wrong to interpret this so directly? I'm now wondering if there > was some sort of compiler optimization, so that q's calculation (line 341) is > actually computed later around the time of line 350, at least as how the > profiler reports it? > > Unfortunately, since proposePath!() is a fairly long function, I can't make > heads or tails of code_llvm(). I posted more at the github gist if anyone is > curious. > > Another thing I'm curious about is whether object attribute lookups are fast: > in particular, if they might be a cause of slowdown for lines 337 and 338. > This was my experience with Cython (that eliminating slow python-land lookups > gave a lot of speedup), but maybe it's the wrong way to think about Julia. > Are abstract types more problematic than concrete types, for attribute > lookup? (I think I saw an effect there, but don't have a detailed example at > the moment. The current implementation the above profiling output stems from > uses a concrete node type.) > > Brendan >
