Hi,

I am struggling with an issue related to garbage collection taking up the 
vast majority (>99%) of compute time on a simple nested for loop. Code 
excerpt below:

# julia version 0.3.7
# counts is an MxN matrix of Float64
# N=15000
# M=108
# h_cols and c_cols are indices \in {1:M}
using HypothesisTests, ArrayViews
ratios = Array(Float64,M)
function compute!(S,tot_i::Int64,i::Int64,j::Int64)
    ratios = view(counts,:,i)./view(counts,:,j)
    #for k=1:M
    #  ratios[k] = counts[k,i]/counts[k,j]
    #end
    #ratios = counts[:,i]./counts[:,j]
    t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols])
    S[tot_i] = (pvalue(t),i,j)
end

for i=1:N-1
  @time for j=(i+1):N
    tot_i += 1
    compute!(S,tot_i,i,j)
  end
end

The loop begins fast, output from time:

elapsed time: 1.023850054 seconds (62027220 bytes allocated)
elapsed time: 0.170916977 seconds (45785624 bytes allocated)
elapsed time: 0.171598156 seconds (45782760 bytes allocated)
elapsed time: 0.173866309 seconds (45779896 bytes allocated)
elapsed time: 0.170267172 seconds (45777032 bytes allocated)
elapsed time: 0.171754713 seconds (45774168 bytes allocated)
elapsed time: 0.170110142 seconds (45771304 bytes allocated)
elapsed time: 0.175199053 seconds (45768440 bytes allocated)
elapsed time: 0.179893161 seconds (45765576 bytes allocated)
elapsed time: 0.212172824 seconds (45762712 bytes allocated)
elapsed time: 0.252750549 seconds (45759848 bytes allocated)
elapsed time: 0.254874855 seconds (45756984 bytes allocated)
elapsed time: 0.231003319 seconds (45754120 bytes allocated)
elapsed time: 0.235060195 seconds (45751256 bytes allocated)
elapsed time: 0.235379355 seconds (45748392 bytes allocated)
elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc time)
elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time)

But as soon as it starts doing gc the % time spent in increases almost 
indefinitely, output from time much later:

elapsed time: 0.174122929 seconds (36239160 bytes allocated)
elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc 
time)
elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc 
time)
elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc 
time)
elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc 
time)
elapsed time: 0.169849999 seconds (36224200 bytes allocated)

The inner loop, despite iterating over fewer and fewer indices has 
massively increased the gc, and therefore overall, execution time. I have 
tried many things, including creating the compute function, devectorizing 
the ratios calculation (which really slowed things down), using view and 
sub in various places, profiling with --trace-allocation=all but I can't 
figure out why this happens or how to fix it. Doing gc for 22 seconds 
obviously kills the performance, and since there are about 22M iterations 
this is prohibitive. Can anyone suggest something I can try to improve the 
performance here?

Thanks,
Adam

Reply via email to