You're not out of the no-slicing woods yet. Looks like you can get rid of
`mx` and `my`

for i=1:limit, j=1:int64(limit/2)
end



As far as parallelizing, you could define:
three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3] +
b[3] + c[3])

and then do a @parallel (three_tup_add) over your sample index?

for that matter, why not compute the two parts of the answer directly
rather than going via A, B, and C?






On Mon, Feb 3, 2014 at 8:11 AM, Alex C <[email protected]> wrote:

> Thanks. I've re-written the function to minimize the amount of copying
> (i.e. slicing) that is required. But now, I'm befuddled as to how to
> parallelize this function using Julia. Any suggestions?
>
> Alex
>
> function expensive_hat(S::Array{Complex{Float64},2}, mx::Array{Int64,2},
> my::Array{Int64,2})
>
>     samples = 64
>         A = zeros(size(mx));
>     B = zeros(size(mx));
>     C = zeros(size(mx));
>
>     for i = 1:samples
>         Si = S[:,i];
>         Sx = Si[mx];
>         Sy = Si[my];
>         Sxy = Si[mx+my];
>         Sxyc = conj(Sxy);
>
>                 A +=  abs2(Sy .* Sx);
>         B += abs2(sqrt(Sxyc .* Sxy));
>         C += Sxyc .* Sy .* Sx;
>     end
>
>         ans = (A .* B ./ samples ./ samples, C./samples)
>     return ans
>
> end
>
> data = rand(24000,64);
> limit = 2000;
>
> ix = int64([1:limit/2]);
> iy = ix[1:end/2];
> mg = zeros(Int64,length(iy),length(ix));
> mx = broadcast(+,ix',mg);
> my = broadcast(+,iy,mg);
> S = rfft(data,1)./24000;
>
> @time (AB, C) = expensive_hat(S,mx,my);
>
>

Reply via email to