I have no experience with it, but it looks like you could also just do: Ans1 = SharedArray(Float64, (limit, int64(limit/2)) Ans2 = SharedArray(Float64, (limit, int64(limit/2))
@parallel for sample=1:samples, i=1:limit, j=1:int64(limit/2) Sx = S[i, sample] Sy = S[j, sample] Sxy = S[i+j, sample] ... Ans1[i,j] = Aix * Bix / samples / samples Ans2[i,j] = Cix / samples end return (Ans1, Ans2) On Mon, Feb 3, 2014 at 8:48 AM, David Salamon <[email protected]> wrote: > Also S[:,1] is allocating. it should look something like: > > for sample=1:samples, i=1:limit, j=1:int64(limit/2) > Sx = S[i, sample] > Sy = S[j, sample] > Sxy = S[i+j, sample] > ... > end > > > On Mon, Feb 3, 2014 at 8:45 AM, David Salamon <[email protected]> wrote: > >> You're not out of the no-slicing woods yet. Looks like you can get rid of >> `mx` and `my` >> >> for i=1:limit, j=1:int64(limit/2) >> end >> >> >> >> As far as parallelizing, you could define: >> three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3] + >> b[3] + c[3]) >> >> and then do a @parallel (three_tup_add) over your sample index? >> >> for that matter, why not compute the two parts of the answer directly >> rather than going via A, B, and C? >> >> >> >> >> >> >> On Mon, Feb 3, 2014 at 8:11 AM, Alex C <[email protected]> wrote: >> >>> Thanks. I've re-written the function to minimize the amount of copying >>> (i.e. slicing) that is required. But now, I'm befuddled as to how to >>> parallelize this function using Julia. Any suggestions? >>> >>> Alex >>> >>> function expensive_hat(S::Array{Complex{Float64},2}, mx::Array{Int64,2}, >>> my::Array{Int64,2}) >>> >>> samples = 64 >>> A = zeros(size(mx)); >>> B = zeros(size(mx)); >>> C = zeros(size(mx)); >>> >>> for i = 1:samples >>> Si = S[:,i]; >>> Sx = Si[mx]; >>> Sy = Si[my]; >>> Sxy = Si[mx+my]; >>> Sxyc = conj(Sxy); >>> >>> A += abs2(Sy .* Sx); >>> B += abs2(sqrt(Sxyc .* Sxy)); >>> C += Sxyc .* Sy .* Sx; >>> end >>> >>> ans = (A .* B ./ samples ./ samples, C./samples) >>> return ans >>> >>> end >>> >>> data = rand(24000,64); >>> limit = 2000; >>> >>> ix = int64([1:limit/2]); >>> iy = ix[1:end/2]; >>> mg = zeros(Int64,length(iy),length(ix)); >>> mx = broadcast(+,ix',mg); >>> my = broadcast(+,iy,mg); >>> S = rfft(data,1)./24000; >>> >>> @time (AB, C) = expensive_hat(S,mx,my); >>> >>> >> >> >
