Thanks Amit -- I think you just saved future me a lot of frustration :)
On Mon, Feb 3, 2014 at 7:27 PM, Amit Murthy <[email protected]> wrote: > Would like to mention that the non-reducer version of @parallel is > asynchronous. Before you can use Ans1 and Ans2, you should wait for > completion. > > For example, if you need to time it, you can wrap it in a @sync block like > this: > > @time @sync begin > @parallel ..... > .... > end > end > > > On Mon, Feb 3, 2014 at 10:25 PM, David Salamon <[email protected]> wrote: > >> I have no experience with it, but it looks like you could also just do: >> >> Ans1 = SharedArray(Float64, (limit, int64(limit/2)) >> Ans2 = SharedArray(Float64, (limit, int64(limit/2)) >> >> @parallel for sample=1:samples, i=1:limit, j=1:int64(limit/2) >> Sx = S[i, sample] >> Sy = S[j, sample] >> Sxy = S[i+j, sample] >> ... >> >> Ans1[i,j] = Aix * Bix / samples / samples >> Ans2[i,j] = Cix / samples >> end >> >> return (Ans1, Ans2) >> >> >> On Mon, Feb 3, 2014 at 8:48 AM, David Salamon <[email protected]> wrote: >> >>> Also S[:,1] is allocating. it should look something like: >>> >>> for sample=1:samples, i=1:limit, j=1:int64(limit/2) >>> Sx = S[i, sample] >>> Sy = S[j, sample] >>> Sxy = S[i+j, sample] >>> ... >>> end >>> >>> >>> On Mon, Feb 3, 2014 at 8:45 AM, David Salamon <[email protected]> wrote: >>> >>>> You're not out of the no-slicing woods yet. Looks like you can get rid >>>> of `mx` and `my` >>>> >>>> for i=1:limit, j=1:int64(limit/2) >>>> end >>>> >>>> >>>> >>>> As far as parallelizing, you could define: >>>> three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3] >>>> + b[3] + c[3]) >>>> >>>> and then do a @parallel (three_tup_add) over your sample index? >>>> >>>> for that matter, why not compute the two parts of the answer directly >>>> rather than going via A, B, and C? >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Feb 3, 2014 at 8:11 AM, Alex C <[email protected]> wrote: >>>> >>>>> Thanks. I've re-written the function to minimize the amount of copying >>>>> (i.e. slicing) that is required. But now, I'm befuddled as to how to >>>>> parallelize this function using Julia. Any suggestions? >>>>> >>>>> Alex >>>>> >>>>> function expensive_hat(S::Array{Complex{Float64},2}, >>>>> mx::Array{Int64,2}, my::Array{Int64,2}) >>>>> >>>>> samples = 64 >>>>> A = zeros(size(mx)); >>>>> B = zeros(size(mx)); >>>>> C = zeros(size(mx)); >>>>> >>>>> for i = 1:samples >>>>> Si = S[:,i]; >>>>> Sx = Si[mx]; >>>>> Sy = Si[my]; >>>>> Sxy = Si[mx+my]; >>>>> Sxyc = conj(Sxy); >>>>> >>>>> A += abs2(Sy .* Sx); >>>>> B += abs2(sqrt(Sxyc .* Sxy)); >>>>> C += Sxyc .* Sy .* Sx; >>>>> end >>>>> >>>>> ans = (A .* B ./ samples ./ samples, C./samples) >>>>> return ans >>>>> >>>>> end >>>>> >>>>> data = rand(24000,64); >>>>> limit = 2000; >>>>> >>>>> ix = int64([1:limit/2]); >>>>> iy = ix[1:end/2]; >>>>> mg = zeros(Int64,length(iy),length(ix)); >>>>> mx = broadcast(+,ix',mg); >>>>> my = broadcast(+,iy,mg); >>>>> S = rfft(data,1)./24000; >>>>> >>>>> @time (AB, C) = expensive_hat(S,mx,my); >>>>> >>>>> >>>> >>>> >>> >> >
