On Tue, 2015-04-07 at 06:14, Adam Labadorf <alabad...@gmail.com> wrote:
> Thanks for the replies. I took your suggestions (and reread the scope 
> section of the docs) and am still experiencing the gc creep. Below is the 
> complete program, with the notable changes that I wrapped the main 
> computation in a function and eliminated all references to global variables 
> inside. I'm also using the most recent nightly build of 0.4. Overall this 
> version of the code is much faster, but there is still significant slowdown 
> as the computation progresses. Is this expected? Do you see anything I'm 
> doing wrong?
>
> # julia v0.4.0-dev+4159
> using HypothesisTests, ArrayViews
>
> type Result
>   pvalue::Float64
>   i::Int64
>   j::Int64
> end
>
> function readtable(fn)
>   fp = readcsv(fn)
>   columns = fp[1,2:end]
>   rows = fp[:,2:end]
>   data = float(fp[2:end,2:end])
>   return (columns,rows,data)
> end
> @time (cols, genes, counts) = 
> readtable("../all_mRNA_nonzero_norm_counts_trim.csv")
>
> h_cols = find([c[1] == 'H' for c in cols])
> c_cols = find([c[1] == 'C' for c in cols])
>
> # filter out genes with zeros since it messes with the ratios
> nonzero = mapslices(x -> !any(x.==0),counts,2)
> counts = counts[find(nonzero),[h_cols;c_cols]]
>
> # slices seem to be faster
> h_cols = 1:length(h_cols)
> c_cols = (length(h_cols)+1):size(counts,2)
>
> # arrays are stored in column order, transpose counts to make
> # accessing more efficient
> counts = transpose(counts)
>
> genes = genes[find(nonzero)]
>
> function main(counts,genes,h_cols,c_cols)
>
>   N = size(genes,1)
>   M = size(counts,1)
>
>   ratios = Array(Float64,M)
>   function 
> compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64)
>       for k=1:M
>         ratios[k] = counts[k,i]/counts[k,j]
>       end
>       t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols))
>       S[tot_i] = Result(pvalue(t),i,j)
>   end

Sadly, nested function are often bad as type-inference does not work
properly, as Tim suggested. Consider this example:

function a(n)
    aa(x,y) = x*y
    out = 0
    for i=1:n
        out += aa(i,i)
    end
    out
end

bb(x,y) = x*y
function b(n)
    out = 0
    for i=1:n
        out += bb(i,i)
    end
    out
end
n = 10^7
@time a(n)
@time a(n)  # elapsed time: 0.680312065 seconds (1220 MB allocated, 2.71% gc 
time in 56 pauses with 0 full sweep)
b(n)
@time b(n)  # elapsed time: 3.086e-6 seconds (192 bytes allocated)

@code_warntype a(n) # see how the return type of the function is not inferred!
@code_warntype b(n)

Move compute! out of main and it should be better.


>   tot_i = 0
>   tot = (N^2-N)/2
>
>   S = Array(Result,round(Int,tot))
>
>   for i=1:N-1
>     @time for j=(i+1):N
>       tot_i += 1
>       compute!(S,counts,tot_i,i,j,h_cols,c_cols,M)
>     end
>   end
>
> end
>
> S = main(counts,genes,h_cols,c_cols)
>
>
> And the output:
>
> elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2 
> pauses with 0 full sweep)
> elapsed time: 0.031006382 seconds (14 MB allocated)
> elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1 
> pauses with 1 full sweep)
> elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1 
> pauses with 0 full sweep)
> elapsed time: 0.030248237 seconds (14 MB allocated)
> ...
> elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1 
> pauses with 0 full sweep)
> elapsed time: 0.011821657 seconds (5 MB allocated)
> elapsed time: 0.011610651 seconds (5 MB allocated)
> elapsed time: 0.011816277 seconds (5 MB allocated)
> elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1 
> pauses with 0 full sweep)
> elapsed time: 0.011997168 seconds (5 MB allocated)
> elapsed time: 0.011721667 seconds (5 MB allocated)
> elapsed time: 0.011561071 seconds (5 MB allocated)

This looks ok-ish to me.  The program runs, allocates memory and every
so often the memory is garbage collected.  What is not ok that the gc
runs after only 20MB is allocated and that it takes so long.  But at
that point all the memory is used up, right?  Maybe that is why it takes
so long then?

> On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote:
>>
>> Silly me, ignoring all the commented out lines assuming they were 
>> comments...yes, this is almost certainly it.
>>
>> On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote:
>>>
>>> Devectorization should never slow anything down. If it does, then you 
>>> have 
>>> some other problem. Here, M is a global variable, and that's probably 
>>> what's 
>>> killing you. Performance tip #1: 
>>> http://docs.julialang.org/en/latest/manual/performance-tips/ 
>>>
>>> --Tim 
>>>
>>> On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote: 
>>> > Hi, 
>>> > 
>>> > I am struggling with an issue related to garbage collection taking up 
>>> the 
>>> > vast majority (>99%) of compute time on a simple nested for loop. Code 
>>> > excerpt below: 
>>> > 
>>> > # julia version 0.3.7 
>>> > # counts is an MxN matrix of Float64 
>>> > # N=15000 
>>> > # M=108 
>>> > # h_cols and c_cols are indices \in {1:M} 
>>> > using HypothesisTests, ArrayViews 
>>> > ratios = Array(Float64,M) 
>>> > function compute!(S,tot_i::Int64,i::Int64,j::Int64) 
>>> >     ratios = view(counts,:,i)./view(counts,:,j) 
>>> >     #for k=1:M 
>>> >     #  ratios[k] = counts[k,i]/counts[k,j] 
>>> >     #end 
>>> >     #ratios = counts[:,i]./counts[:,j] 
>>> >     t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols]) 
>>> >     S[tot_i] = (pvalue(t),i,j) 
>>> > end 
>>> > 
>>> > for i=1:N-1 
>>> >   @time for j=(i+1):N 
>>> >     tot_i += 1 
>>> >     compute!(S,tot_i,i,j) 
>>> >   end 
>>> > end 
>>> > 
>>> > The loop begins fast, output from time: 
>>> > 
>>> > elapsed time: 1.023850054 seconds (62027220 bytes allocated) 
>>> > elapsed time: 0.170916977 seconds (45785624 bytes allocated) 
>>> > elapsed time: 0.171598156 seconds (45782760 bytes allocated) 
>>> > elapsed time: 0.173866309 seconds (45779896 bytes allocated) 
>>> > elapsed time: 0.170267172 seconds (45777032 bytes allocated) 
>>> > elapsed time: 0.171754713 seconds (45774168 bytes allocated) 
>>> > elapsed time: 0.170110142 seconds (45771304 bytes allocated) 
>>> > elapsed time: 0.175199053 seconds (45768440 bytes allocated) 
>>> > elapsed time: 0.179893161 seconds (45765576 bytes allocated) 
>>> > elapsed time: 0.212172824 seconds (45762712 bytes allocated) 
>>> > elapsed time: 0.252750549 seconds (45759848 bytes allocated) 
>>> > elapsed time: 0.254874855 seconds (45756984 bytes allocated) 
>>> > elapsed time: 0.231003319 seconds (45754120 bytes allocated) 
>>> > elapsed time: 0.235060195 seconds (45751256 bytes allocated) 
>>> > elapsed time: 0.235379355 seconds (45748392 bytes allocated) 
>>> > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc 
>>> time) 
>>> > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc 
>>> time) 
>>> > 
>>> > But as soon as it starts doing gc the % time spent in increases almost 
>>> > indefinitely, output from time much later: 
>>> > 
>>> > elapsed time: 0.174122929 seconds (36239160 bytes allocated) 
>>> > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc 
>>> > time) 
>>> > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc 
>>> > time) 
>>> > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc 
>>> > time) 
>>> > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc 
>>> > time) 
>>> > elapsed time: 0.169849999 seconds (36224200 bytes allocated) 
>>> > 
>>> > The inner loop, despite iterating over fewer and fewer indices has 
>>> > massively increased the gc, and therefore overall, execution time. I 
>>> have 
>>> > tried many things, including creating the compute function, 
>>> devectorizing 
>>> > the ratios calculation (which really slowed things down), using view 
>>> and 
>>> > sub in various places, profiling with --trace-allocation=all but I 
>>> can't 
>>> > figure out why this happens or how to fix it. Doing gc for 22 seconds 
>>> > obviously kills the performance, and since there are about 22M 
>>> iterations 
>>> > this is prohibitive. Can anyone suggest something I can try to improve 
>>> the 
>>> > performance here? 
>>> > 
>>> > Thanks, 
>>> > Adam 
>>>
>>>

Reply via email to