Devectorization should never slow anything down. If it does, then you have 
some other problem. Here, M is a global variable, and that's probably what's 
killing you. Performance tip #1:
http://docs.julialang.org/en/latest/manual/performance-tips/

--Tim

On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote:
> Hi,
> 
> I am struggling with an issue related to garbage collection taking up the
> vast majority (>99%) of compute time on a simple nested for loop. Code
> excerpt below:
> 
> # julia version 0.3.7
> # counts is an MxN matrix of Float64
> # N=15000
> # M=108
> # h_cols and c_cols are indices \in {1:M}
> using HypothesisTests, ArrayViews
> ratios = Array(Float64,M)
> function compute!(S,tot_i::Int64,i::Int64,j::Int64)
>     ratios = view(counts,:,i)./view(counts,:,j)
>     #for k=1:M
>     #  ratios[k] = counts[k,i]/counts[k,j]
>     #end
>     #ratios = counts[:,i]./counts[:,j]
>     t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols])
>     S[tot_i] = (pvalue(t),i,j)
> end
> 
> for i=1:N-1
>   @time for j=(i+1):N
>     tot_i += 1
>     compute!(S,tot_i,i,j)
>   end
> end
> 
> The loop begins fast, output from time:
> 
> elapsed time: 1.023850054 seconds (62027220 bytes allocated)
> elapsed time: 0.170916977 seconds (45785624 bytes allocated)
> elapsed time: 0.171598156 seconds (45782760 bytes allocated)
> elapsed time: 0.173866309 seconds (45779896 bytes allocated)
> elapsed time: 0.170267172 seconds (45777032 bytes allocated)
> elapsed time: 0.171754713 seconds (45774168 bytes allocated)
> elapsed time: 0.170110142 seconds (45771304 bytes allocated)
> elapsed time: 0.175199053 seconds (45768440 bytes allocated)
> elapsed time: 0.179893161 seconds (45765576 bytes allocated)
> elapsed time: 0.212172824 seconds (45762712 bytes allocated)
> elapsed time: 0.252750549 seconds (45759848 bytes allocated)
> elapsed time: 0.254874855 seconds (45756984 bytes allocated)
> elapsed time: 0.231003319 seconds (45754120 bytes allocated)
> elapsed time: 0.235060195 seconds (45751256 bytes allocated)
> elapsed time: 0.235379355 seconds (45748392 bytes allocated)
> elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc time)
> elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc time)
> 
> But as soon as it starts doing gc the % time spent in increases almost
> indefinitely, output from time much later:
> 
> elapsed time: 0.174122929 seconds (36239160 bytes allocated)
> elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc
> time)
> elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc
> time)
> elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc
> time)
> elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc
> time)
> elapsed time: 0.169849999 seconds (36224200 bytes allocated)
> 
> The inner loop, despite iterating over fewer and fewer indices has
> massively increased the gc, and therefore overall, execution time. I have
> tried many things, including creating the compute function, devectorizing
> the ratios calculation (which really slowed things down), using view and
> sub in various places, profiling with --trace-allocation=all but I can't
> figure out why this happens or how to fix it. Doing gc for 22 seconds
> obviously kills the performance, and since there are about 22M iterations
> this is prohibitive. Can anyone suggest something I can try to improve the
> performance here?
> 
> Thanks,
> Adam

Reply via email to