Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Madeleine Udell Thu, 23 Jan 2014 19:31:19 -0800

even more problematic: I can't multiply by my SharedArray:

no method *(SharedArray{Float64,2}, Array{Float64,2})


On Thursday, January 23, 2014 7:22:59 PM UTC-8, Madeleine Udell wrote:
>
> Thanks! I'm trying out a SharedArray solution now, but wondered if you can 
> tell me if there's an easy way to reimplement many of the convenience 
> wrappers on arrays for shared arrays. Eg I get the following errors:
>
> >> shared_array[1,:]
> no method getindex(SharedArray{Float64,2}, Float64, Range1{Int64})
>
> >> repmat(shared_array,2,1)
> no method similar(SharedArray{Float64,2}, Type{Float64}, (Int64,Int64))
>  in repmat at abstractarray.jl:1043
>
> I'm surprised these aren't inherited properties from AbstractArray!
>
> On Wednesday, January 22, 2014 8:05:45 PM UTC-8, Amit Murthy wrote:
>>
>> 1. The SharedArray object can be sent to any of the processes that mapped 
>> the shared memory segment during construction. The backing array is not 
>> copied.
>> 2. User defined composite types are fine as long as isbits(T) is true.
>>
>>
>>
>> On Thu, Jan 23, 2014 at 1:01 AM, Madeleine Udell <[email protected]>wrote:
>>
>>> That's not a problem for me; all of my data is numeric. To summarize a 
>>> long post, I'm interested in understanding 
>>>
>>> 1) good programming paradigms for using shared memory together with 
>>> parallel maps. In particular, can a shared array and other nonshared data 
>>> structure be combined into a single data structure and "passed" in a remote 
>>> call without unnecessarily copying the shared array? and 
>>> 2) possibilities for extending shared memory in julia to other data 
>>> types, and even to user defined types.
>>>
>>>
>>> On Tuesday, January 21, 2014 11:17:10 PM UTC-8, Amit Murthy wrote:
>>>
>>>> I have not gone through your post in detail, but would like to point 
>>>> out that SharedArray can only be used for bitstypes.
>>>>
>>>>
>>>> On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell <[email protected]
>>>> > wrote:
>>>>
>>>>> # Say I have a list of tasks, eg tasks i=1:n
>>>>> # For each task I want to call a function foo
>>>>> # that depends on that task and some fixed data
>>>>> # I have many types of fixed data: eg, arrays, dictionaries, integers, 
>>>>> etc
>>>>>
>>>>> # Imagine the data comes from eg loading a file based on user input,
>>>>> # so we can't hard code the data into the function foo 
>>>>> # although it's constant during program execution
>>>>>
>>>>> # If I were doing this in serial, I'd do the following
>>>>>
>>>>> type MyData
>>>>> myint
>>>>> mydict
>>>>> myarray
>>>>> end
>>>>>
>>>>> function foo(task,data::MyData)
>>>>> data.myint + data.myarray[data.mydict[task]]
>>>>> end
>>>>>
>>>>> n = 10
>>>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>>>
>>>>> results = zeros(n)
>>>>> for i = 1:n
>>>>> results[i] = foo(i,data)
>>>>> end
>>>>>
>>>>> # What's the right way to do this in parallel? Here are a number of 
>>>>> ideas
>>>>> # To use @parallel or pmap, we have to first copy all the code and 
>>>>> data everywhere
>>>>> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>>>>>
>>>>> @everywhere begin
>>>>> type MyData
>>>>>  myint
>>>>> mydict
>>>>> myarray
>>>>> end
>>>>>
>>>>> function foo(task,data::MyData)
>>>>> data.myint + data.myarray[data.mydict[task]]
>>>>> end
>>>>>
>>>>> n = 10
>>>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>>> end
>>>>>
>>>>>  ## @parallel
>>>>> results = zeros(n)
>>>>> @parallel for i = 1:n
>>>>> results[i] = foo(i,data)
>>>>> end
>>>>>
>>>>> ## pmap
>>>>> @everywhere foo(task) = foo(task,data)
>>>>> results = pmap(foo,1:n)
>>>>>
>>>>> # To avoid copying data, I can make myarray a shared array
>>>>> # In that case, I don't want to use @everywhere to put data on each 
>>>>> processor
>>>>> # since that would reinstantiate the shared array.
>>>>> # My current solution is to rewrite my data structure to *not* include 
>>>>> myarray,
>>>>> # and pass the array to the function foo separately.
>>>>> # But the code gets much less pretty as I tear apart my data structure,
>>>>> # especially if I have a large number of shared arrays. 
>>>>> # Is there a way for me to avoid this while using shared memory?
>>>>> # really, I'd like to be able to define my own shared memory data 
>>>>> types...
>>>>>
>>>>> @everywhere begin
>>>>> type MySmallerData
>>>>> myint
>>>>> mydict
>>>>>  end
>>>>>
>>>>> function foo(task,data::MySmallerData,myarray::SharedArray)
>>>>> data.myint + myarray[data.mydict[task]]
>>>>>  end
>>>>>
>>>>> n = 10
>>>>> const data = MySmallerData(rand(),Dict(1:n,randperm(n)))
>>>>> end
>>>>>
>>>>> myarray = SharedArray(randperm(n))
>>>>>
>>>>> ## @parallel
>>>>> results = zeros(n)
>>>>> @parallel for i = 1:n
>>>>> results[i] = foo(i,data,myarray)
>>>>> end
>>>>>
>>>>> ## pmap
>>>>> @everywhere foo(task) = foo(task,data,myarray)
>>>>> results = pmap(foo,1:n)
>>>>>
>>>>> # Finally, what can I do to avoid copying mydict to each processor?
>>>>> # Is there a way to use shared memory for it?
>>>>> # Once again, I'd really like to be able to define my own shared 
>>>>> memory data types...
>>>>>
>>>>
>>>>
>>

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Reply via email to