This might be an algorithm that benefits from the GC improvements in 0.4--with the caveat that 0.4 is still a work in progress and has a variety of syntax changes (and possibly more to come) you might want to give that a try.
On Friday, April 3, 2015 at 8:22:49 PM UTC-5, Adam Labadorf wrote: > > Hi, > > I am struggling with an issue related to garbage collection taking up the > vast majority (>99%) of compute time on a simple nested for loop. Code > excerpt below: > > # julia version 0.3.7 > # counts is an MxN matrix of Float64 > # N=15000 > # M=108 > # h_cols and c_cols are indices \in {1:M} > using HypothesisTests, ArrayViews > ratios = Array(Float64,M) > function compute!(S,tot_i::Int64,i::Int64,j::Int64) > ratios = view(counts,:,i)./view(counts,:,j) > #for k=1:M > # ratios[k] = counts[k,i]/counts[k,j] > #end > #ratios = counts[:,i]./counts[:,j] > t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) > S[tot_i] = (pvalue(t),i,j) > end > > for i=1:N-1 > @time for j=(i+1):N > tot_i += 1 > compute!(S,tot_i,i,j) > end > end > > The loop begins fast, output from time: > > elapsed time: 1.023850054 seconds (62027220 bytes allocated) > elapsed time: 0.170916977 seconds (45785624 bytes allocated) > elapsed time: 0.171598156 seconds (45782760 bytes allocated) > elapsed time: 0.173866309 seconds (45779896 bytes allocated) > elapsed time: 0.170267172 seconds (45777032 bytes allocated) > elapsed time: 0.171754713 seconds (45774168 bytes allocated) > elapsed time: 0.170110142 seconds (45771304 bytes allocated) > elapsed time: 0.175199053 seconds (45768440 bytes allocated) > elapsed time: 0.179893161 seconds (45765576 bytes allocated) > elapsed time: 0.212172824 seconds (45762712 bytes allocated) > elapsed time: 0.252750549 seconds (45759848 bytes allocated) > elapsed time: 0.254874855 seconds (45756984 bytes allocated) > elapsed time: 0.231003319 seconds (45754120 bytes allocated) > elapsed time: 0.235060195 seconds (45751256 bytes allocated) > elapsed time: 0.235379355 seconds (45748392 bytes allocated) > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc > time) > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time) > > But as soon as it starts doing gc the % time spent in increases almost > indefinitely, output from time much later: > > elapsed time: 0.174122929 seconds (36239160 bytes allocated) > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc > time) > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc > time) > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc > time) > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc > time) > elapsed time: 0.169849999 seconds (36224200 bytes allocated) > > The inner loop, despite iterating over fewer and fewer indices has > massively increased the gc, and therefore overall, execution time. I have > tried many things, including creating the compute function, devectorizing > the ratios calculation (which really slowed things down), using view and > sub in various places, profiling with --trace-allocation=all but I can't > figure out why this happens or how to fix it. Doing gc for 22 seconds > obviously kills the performance, and since there are about 22M iterations > this is prohibitive. Can anyone suggest something I can try to improve the > performance here? > > Thanks, > Adam >