Also S[:,1] is allocating. it should look something like: for sample=1:samples, i=1:limit, j=1:int64(limit/2) Sx = S[i, sample] Sy = S[j, sample] Sxy = S[i+j, sample] ... end
On Mon, Feb 3, 2014 at 8:45 AM, David Salamon <[email protected]> wrote: > You're not out of the no-slicing woods yet. Looks like you can get rid of > `mx` and `my` > > for i=1:limit, j=1:int64(limit/2) > end > > > > As far as parallelizing, you could define: > three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3] + > b[3] + c[3]) > > and then do a @parallel (three_tup_add) over your sample index? > > for that matter, why not compute the two parts of the answer directly > rather than going via A, B, and C? > > > > > > > On Mon, Feb 3, 2014 at 8:11 AM, Alex C <[email protected]> wrote: > >> Thanks. I've re-written the function to minimize the amount of copying >> (i.e. slicing) that is required. But now, I'm befuddled as to how to >> parallelize this function using Julia. Any suggestions? >> >> Alex >> >> function expensive_hat(S::Array{Complex{Float64},2}, mx::Array{Int64,2}, >> my::Array{Int64,2}) >> >> samples = 64 >> A = zeros(size(mx)); >> B = zeros(size(mx)); >> C = zeros(size(mx)); >> >> for i = 1:samples >> Si = S[:,i]; >> Sx = Si[mx]; >> Sy = Si[my]; >> Sxy = Si[mx+my]; >> Sxyc = conj(Sxy); >> >> A += abs2(Sy .* Sx); >> B += abs2(sqrt(Sxyc .* Sxy)); >> C += Sxyc .* Sy .* Sx; >> end >> >> ans = (A .* B ./ samples ./ samples, C./samples) >> return ans >> >> end >> >> data = rand(24000,64); >> limit = 2000; >> >> ix = int64([1:limit/2]); >> iy = ix[1:end/2]; >> mg = zeros(Int64,length(iy),length(ix)); >> mx = broadcast(+,ix',mg); >> my = broadcast(+,iy,mg); >> S = rfft(data,1)./24000; >> >> @time (AB, C) = expensive_hat(S,mx,my); >> >> > >
