Out of curiosity, what happens if you take the compute! function out of main 
and pass ratios (and any other needed variables) as another argument?

--Tim

On Monday, April 06, 2015 09:14:24 PM Adam Labadorf wrote:
> Thanks for the replies. I took your suggestions (and reread the scope
> section of the docs) and am still experiencing the gc creep. Below is the
> complete program, with the notable changes that I wrapped the main
> computation in a function and eliminated all references to global variables
> inside. I'm also using the most recent nightly build of 0.4. Overall this
> version of the code is much faster, but there is still significant slowdown
> as the computation progresses. Is this expected? Do you see anything I'm
> doing wrong?
> 
> # julia v0.4.0-dev+4159
> using HypothesisTests, ArrayViews
> 
> type Result
>   pvalue::Float64
>   i::Int64
>   j::Int64
> end
> 
> function readtable(fn)
>   fp = readcsv(fn)
>   columns = fp[1,2:end]
>   rows = fp[:,2:end]
>   data = float(fp[2:end,2:end])
>   return (columns,rows,data)
> end
> @time (cols, genes, counts) =
> readtable("../all_mRNA_nonzero_norm_counts_trim.csv")
> 
> h_cols = find([c[1] == 'H' for c in cols])
> c_cols = find([c[1] == 'C' for c in cols])
> 
> # filter out genes with zeros since it messes with the ratios
> nonzero = mapslices(x -> !any(x.==0),counts,2)
> counts = counts[find(nonzero),[h_cols;c_cols]]
> 
> # slices seem to be faster
> h_cols = 1:length(h_cols)
> c_cols = (length(h_cols)+1):size(counts,2)
> 
> # arrays are stored in column order, transpose counts to make
> # accessing more efficient
> counts = transpose(counts)
> 
> genes = genes[find(nonzero)]
> 
> function main(counts,genes,h_cols,c_cols)
> 
>   N = size(genes,1)
>   M = size(counts,1)
> 
>   ratios = Array(Float64,M)
>   function
> compute!(S::Array{Result,1},counts::Array{Float64,2},tot_i::Int64,i::Int64,j
> ::Int64,h_cols::UnitRange{Int64},c_cols::UnitRange{Int64},M::Int64) for
> k=1:M
>         ratios[k] = counts[k,i]/counts[k,j]
>       end
>       t = UnequalVarianceTTest(view(ratios,h_cols),view(ratios,c_cols))
>       S[tot_i] = Result(pvalue(t),i,j)
>   end
> 
>   tot_i = 0
>   tot = (N^2-N)/2
> 
>   S = Array(Result,round(Int,tot))
> 
>   for i=1:N-1
>     @time for j=(i+1):N
>       tot_i += 1
>       compute!(S,counts,tot_i,i,j,h_cols,c_cols,M)
>     end
>   end
> 
> end
> 
> S = main(counts,genes,h_cols,c_cols)
> 
> 
> And the output:
> 
> elapsed time: 0.427719149 seconds (23 MB allocated, 39.90% gc time in 2
> pauses with 0 full sweep)
> elapsed time: 0.031006382 seconds (14 MB allocated)
> elapsed time: 0.131579099 seconds (14 MB allocated, 73.64% gc time in 1
> pauses with 1 full sweep)
> elapsed time: 0.140120717 seconds (14 MB allocated, 73.58% gc time in 1
> pauses with 0 full sweep)
> elapsed time: 0.030248237 seconds (14 MB allocated)
> ...
> elapsed time: 0.507894781 seconds (5 MB allocated, 97.65% gc time in 1
> pauses with 0 full sweep)
> elapsed time: 0.011821657 seconds (5 MB allocated)
> elapsed time: 0.011610651 seconds (5 MB allocated)
> elapsed time: 0.011816277 seconds (5 MB allocated)
> elapsed time: 0.50779098 seconds (5 MB allocated, 97.65% gc time in 1
> pauses with 0 full sweep)
> elapsed time: 0.011997168 seconds (5 MB allocated)
> elapsed time: 0.011721667 seconds (5 MB allocated)
> elapsed time: 0.011561071 seconds (5 MB allocated)
> 
> On Saturday, April 4, 2015 at 12:38:46 PM UTC-4, Patrick O'Leary wrote:
> > Silly me, ignoring all the commented out lines assuming they were
> > comments...yes, this is almost certainly it.
> > 
> > On Saturday, April 4, 2015 at 3:24:50 AM UTC-5, Tim Holy wrote:
> >> Devectorization should never slow anything down. If it does, then you
> >> have
> >> some other problem. Here, M is a global variable, and that's probably
> >> what's
> >> killing you. Performance tip #1:
> >> http://docs.julialang.org/en/latest/manual/performance-tips/
> >> 
> >> --Tim
> >> 
> >> On Friday, April 03, 2015 09:43:51 AM Adam Labadorf wrote:
> >> > Hi,
> >> > 
> >> > I am struggling with an issue related to garbage collection taking up
> >> 
> >> the
> >> 
> >> > vast majority (>99%) of compute time on a simple nested for loop. Code
> >> > excerpt below:
> >> > 
> >> > # julia version 0.3.7
> >> > # counts is an MxN matrix of Float64
> >> > # N=15000
> >> > # M=108
> >> > # h_cols and c_cols are indices \in {1:M}
> >> > using HypothesisTests, ArrayViews
> >> > ratios = Array(Float64,M)
> >> > function compute!(S,tot_i::Int64,i::Int64,j::Int64)
> >> > 
> >> >     ratios = view(counts,:,i)./view(counts,:,j)
> >> >     #for k=1:M
> >> >     #  ratios[k] = counts[k,i]/counts[k,j]
> >> >     #end
> >> >     #ratios = counts[:,i]./counts[:,j]
> >> >     t = UnequalVarianceTTest(ratios[h_cols],ratios[c_cols])
> >> >     S[tot_i] = (pvalue(t),i,j)
> >> > 
> >> > end
> >> > 
> >> > for i=1:N-1
> >> > 
> >> >   @time for j=(i+1):N
> >> >   
> >> >     tot_i += 1
> >> >     compute!(S,tot_i,i,j)
> >> >   
> >> >   end
> >> > 
> >> > end
> >> > 
> >> > The loop begins fast, output from time:
> >> > 
> >> > elapsed time: 1.023850054 seconds (62027220 bytes allocated)
> >> > elapsed time: 0.170916977 seconds (45785624 bytes allocated)
> >> > elapsed time: 0.171598156 seconds (45782760 bytes allocated)
> >> > elapsed time: 0.173866309 seconds (45779896 bytes allocated)
> >> > elapsed time: 0.170267172 seconds (45777032 bytes allocated)
> >> > elapsed time: 0.171754713 seconds (45774168 bytes allocated)
> >> > elapsed time: 0.170110142 seconds (45771304 bytes allocated)
> >> > elapsed time: 0.175199053 seconds (45768440 bytes allocated)
> >> > elapsed time: 0.179893161 seconds (45765576 bytes allocated)
> >> > elapsed time: 0.212172824 seconds (45762712 bytes allocated)
> >> > elapsed time: 0.252750549 seconds (45759848 bytes allocated)
> >> > elapsed time: 0.254874855 seconds (45756984 bytes allocated)
> >> > elapsed time: 0.231003319 seconds (45754120 bytes allocated)
> >> > elapsed time: 0.235060195 seconds (45751256 bytes allocated)
> >> > elapsed time: 0.235379355 seconds (45748392 bytes allocated)
> >> > elapsed time: 0.927622743 seconds (45746168 bytes allocated, 77.65% gc
> >> 
> >> time)
> >> 
> >> > elapsed time: 0.9132931 seconds (45742664 bytes allocated, 78.35% gc
> >> 
> >> time)
> >> 
> >> > But as soon as it starts doing gc the % time spent in increases almost
> >> > indefinitely, output from time much later:
> >> > 
> >> > elapsed time: 0.174122929 seconds (36239160 bytes allocated)
> >> > elapsed time: 18.535572658 seconds (36236168 bytes allocated, 99.22% gc
> >> > time)
> >> > elapsed time: 19.189478819 seconds (36233176 bytes allocated, 99.26% gc
> >> > time)
> >> > elapsed time: 21.812889439 seconds (36230184 bytes allocated, 99.35% gc
> >> > time)
> >> > elapsed time: 22.182467723 seconds (36227192 bytes allocated, 99.30% gc
> >> > time)
> >> > elapsed time: 0.169849999 seconds (36224200 bytes allocated)
> >> > 
> >> > The inner loop, despite iterating over fewer and fewer indices has
> >> > massively increased the gc, and therefore overall, execution time. I
> >> 
> >> have
> >> 
> >> > tried many things, including creating the compute function,
> >> 
> >> devectorizing
> >> 
> >> > the ratios calculation (which really slowed things down), using view
> >> 
> >> and
> >> 
> >> > sub in various places, profiling with --trace-allocation=all but I
> >> 
> >> can't
> >> 
> >> > figure out why this happens or how to fix it. Doing gc for 22 seconds
> >> > obviously kills the performance, and since there are about 22M
> >> 
> >> iterations
> >> 
> >> > this is prohibitive. Can anyone suggest something I can try to improve
> >> 
> >> the
> >> 
> >> > performance here?
> >> > 
> >> > Thanks,
> >> > Adam

Reply via email to