Thanks Amit -- I think you just saved future me a lot of frustration :)

On Mon, Feb 3, 2014 at 7:27 PM, Amit Murthy <[email protected]> wrote:

> Would like to mention that the non-reducer version of @parallel is
> asynchronous. Before you can use Ans1 and Ans2, you should wait for
> completion.
>
> For example, if you need to time it, you can wrap it in a @sync block like
> this:
>
> @time @sync begin
>    @parallel .....
>       ....
>    end
> end
>
>
> On Mon, Feb 3, 2014 at 10:25 PM, David Salamon <[email protected]> wrote:
>
>> I have no experience with it, but it looks like you could also just do:
>>
>> Ans1 = SharedArray(Float64, (limit, int64(limit/2))
>> Ans2 = SharedArray(Float64, (limit, int64(limit/2))
>>
>> @parallel for sample=1:samples, i=1:limit, j=1:int64(limit/2)
>>    Sx = S[i, sample]
>>    Sy = S[j, sample]
>>    Sxy = S[i+j, sample]
>>    ...
>>
>>   Ans1[i,j] = Aix * Bix / samples / samples
>>   Ans2[i,j] = Cix / samples
>> end
>>
>> return (Ans1, Ans2)
>>
>>
>> On Mon, Feb 3, 2014 at 8:48 AM, David Salamon <[email protected]> wrote:
>>
>>> Also S[:,1] is allocating. it should look something like:
>>>
>>> for sample=1:samples, i=1:limit, j=1:int64(limit/2)
>>>    Sx = S[i, sample]
>>>    Sy = S[j, sample]
>>>    Sxy = S[i+j, sample]
>>>    ...
>>> end
>>>
>>>
>>> On Mon, Feb 3, 2014 at 8:45 AM, David Salamon <[email protected]> wrote:
>>>
>>>> You're not out of the no-slicing woods yet. Looks like you can get rid
>>>> of `mx` and `my`
>>>>
>>>> for i=1:limit, j=1:int64(limit/2)
>>>> end
>>>>
>>>>
>>>>
>>>> As far as parallelizing, you could define:
>>>> three_tup_add(a, b, c) = (a[1] + b[1] + c[1], a[2] + b[2] + c[2], a[3]
>>>> + b[3] + c[3])
>>>>
>>>> and then do a @parallel (three_tup_add) over your sample index?
>>>>
>>>> for that matter, why not compute the two parts of the answer directly
>>>> rather than going via A, B, and C?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 3, 2014 at 8:11 AM, Alex C <[email protected]> wrote:
>>>>
>>>>> Thanks. I've re-written the function to minimize the amount of copying
>>>>> (i.e. slicing) that is required. But now, I'm befuddled as to how to
>>>>> parallelize this function using Julia. Any suggestions?
>>>>>
>>>>> Alex
>>>>>
>>>>> function expensive_hat(S::Array{Complex{Float64},2},
>>>>> mx::Array{Int64,2}, my::Array{Int64,2})
>>>>>
>>>>>     samples = 64
>>>>>         A = zeros(size(mx));
>>>>>     B = zeros(size(mx));
>>>>>     C = zeros(size(mx));
>>>>>
>>>>>     for i = 1:samples
>>>>>         Si = S[:,i];
>>>>>         Sx = Si[mx];
>>>>>         Sy = Si[my];
>>>>>         Sxy = Si[mx+my];
>>>>>         Sxyc = conj(Sxy);
>>>>>
>>>>>                 A +=  abs2(Sy .* Sx);
>>>>>         B += abs2(sqrt(Sxyc .* Sxy));
>>>>>         C += Sxyc .* Sy .* Sx;
>>>>>     end
>>>>>
>>>>>         ans = (A .* B ./ samples ./ samples, C./samples)
>>>>>     return ans
>>>>>
>>>>> end
>>>>>
>>>>> data = rand(24000,64);
>>>>> limit = 2000;
>>>>>
>>>>> ix = int64([1:limit/2]);
>>>>> iy = ix[1:end/2];
>>>>> mg = zeros(Int64,length(iy),length(ix));
>>>>> mx = broadcast(+,ix',mg);
>>>>> my = broadcast(+,iy,mg);
>>>>> S = rfft(data,1)./24000;
>>>>>
>>>>> @time (AB, C) = expensive_hat(S,mx,my);
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to