Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Madeleine Udell Sat, 25 Jan 2014 23:27:47 -0800

That works great. Thanks!


On Thu, Jan 23, 2014 at 8:39 PM, Amit Murthy <amit.mur...@gmail.com> wrote:

> The SharedArray object ha a field loc_shmarr which represents the backing
> array. So S.loc_shmarr should work everywhere. But you are right, we need
> to ensure that the SharedArray can be used just as a regular array.
>
>
> On Fri, Jan 24, 2014 at 9:00 AM, Madeleine Udell <
> madeleine.ud...@gmail.com> wrote:
>
>> even more problematic: I can't multiply by my SharedArray:
>>
>> no method *(SharedArray{Float64,2}, Array{Float64,2})
>>
>>
>> On Thursday, January 23, 2014 7:22:59 PM UTC-8, Madeleine Udell wrote:
>>>
>>> Thanks! I'm trying out a SharedArray solution now, but wondered if you
>>> can tell me if there's an easy way to reimplement many of the convenience
>>> wrappers on arrays for shared arrays. Eg I get the following errors:
>>>
>>> >> shared_array[1,:]
>>> no method getindex(SharedArray{Float64,2}, Float64, Range1{Int64})
>>>
>>> >> repmat(shared_array,2,1)
>>> no method similar(SharedArray{Float64,2}, Type{Float64}, (Int64,Int64))
>>>  in repmat at abstractarray.jl:1043
>>>
>>> I'm surprised these aren't inherited properties from AbstractArray!
>>>
>>> On Wednesday, January 22, 2014 8:05:45 PM UTC-8, Amit Murthy wrote:
>>>>
>>>> 1. The SharedArray object can be sent to any of the processes that
>>>> mapped the shared memory segment during construction. The backing array is
>>>> not copied.
>>>> 2. User defined composite types are fine as long as isbits(T) is true.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 23, 2014 at 1:01 AM, Madeleine Udell 
>>>> <madelei...@gmail.com>wrote:
>>>>
>>>>> That's not a problem for me; all of my data is numeric. To summarize a
>>>>> long post, I'm interested in understanding
>>>>>
>>>>> 1) good programming paradigms for using shared memory together with
>>>>> parallel maps. In particular, can a shared array and other nonshared data
>>>>> structure be combined into a single data structure and "passed" in a 
>>>>> remote
>>>>> call without unnecessarily copying the shared array? and
>>>>> 2) possibilities for extending shared memory in julia to other data
>>>>> types, and even to user defined types.
>>>>>
>>>>>
>>>>> On Tuesday, January 21, 2014 11:17:10 PM UTC-8, Amit Murthy wrote:
>>>>>
>>>>>> I have not gone through your post in detail, but would like to point
>>>>>> out that SharedArray can only be used for bitstypes.
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell <
>>>>>> madelei...@gmail.com> wrote:
>>>>>>
>>>>>>> # Say I have a list of tasks, eg tasks i=1:n
>>>>>>> # For each task I want to call a function foo
>>>>>>> # that depends on that task and some fixed data
>>>>>>> # I have many types of fixed data: eg, arrays, dictionaries,
>>>>>>> integers, etc
>>>>>>>
>>>>>>> # Imagine the data comes from eg loading a file based on user input,
>>>>>>> # so we can't hard code the data into the function foo
>>>>>>> # although it's constant during program execution
>>>>>>>
>>>>>>> # If I were doing this in serial, I'd do the following
>>>>>>>
>>>>>>> type MyData
>>>>>>> myint
>>>>>>> mydict
>>>>>>> myarray
>>>>>>> end
>>>>>>>
>>>>>>> function foo(task,data::MyData)
>>>>>>> data.myint + data.myarray[data.mydict[task]]
>>>>>>> end
>>>>>>>
>>>>>>> n = 10
>>>>>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>>>>>
>>>>>>> results = zeros(n)
>>>>>>> for i = 1:n
>>>>>>> results[i] = foo(i,data)
>>>>>>> end
>>>>>>>
>>>>>>> # What's the right way to do this in parallel? Here are a number of
>>>>>>> ideas
>>>>>>> # To use @parallel or pmap, we have to first copy all the code and
>>>>>>> data everywhere
>>>>>>> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>>>>>>>
>>>>>>> @everywhere begin
>>>>>>> type MyData
>>>>>>>  myint
>>>>>>> mydict
>>>>>>> myarray
>>>>>>> end
>>>>>>>
>>>>>>> function foo(task,data::MyData)
>>>>>>> data.myint + data.myarray[data.mydict[task]]
>>>>>>> end
>>>>>>>
>>>>>>> n = 10
>>>>>>> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>>>>>>> end
>>>>>>>
>>>>>>>  ## @parallel
>>>>>>> results = zeros(n)
>>>>>>> @parallel for i = 1:n
>>>>>>> results[i] = foo(i,data)
>>>>>>> end
>>>>>>>
>>>>>>> ## pmap
>>>>>>> @everywhere foo(task) = foo(task,data)
>>>>>>> results = pmap(foo,1:n)
>>>>>>>
>>>>>>> # To avoid copying data, I can make myarray a shared array
>>>>>>> # In that case, I don't want to use @everywhere to put data on each
>>>>>>> processor
>>>>>>> # since that would reinstantiate the shared array.
>>>>>>> # My current solution is to rewrite my data structure to *not*
>>>>>>> include myarray,
>>>>>>> # and pass the array to the function foo separately.
>>>>>>> # But the code gets much less pretty as I tear apart my data
>>>>>>> structure,
>>>>>>> # especially if I have a large number of shared arrays.
>>>>>>> # Is there a way for me to avoid this while using shared memory?
>>>>>>> # really, I'd like to be able to define my own shared memory data
>>>>>>> types...
>>>>>>>
>>>>>>> @everywhere begin
>>>>>>> type MySmallerData
>>>>>>> myint
>>>>>>> mydict
>>>>>>>  end
>>>>>>>
>>>>>>> function foo(task,data::MySmallerData,myarray::SharedArray)
>>>>>>> data.myint + myarray[data.mydict[task]]
>>>>>>>  end
>>>>>>>
>>>>>>> n = 10
>>>>>>> const data = MySmallerData(rand(),Dict(1:n,randperm(n)))
>>>>>>> end
>>>>>>>
>>>>>>> myarray = SharedArray(randperm(n))
>>>>>>>
>>>>>>> ## @parallel
>>>>>>> results = zeros(n)
>>>>>>> @parallel for i = 1:n
>>>>>>> results[i] = foo(i,data,myarray)
>>>>>>> end
>>>>>>>
>>>>>>> ## pmap
>>>>>>> @everywhere foo(task) = foo(task,data,myarray)
>>>>>>> results = pmap(foo,1:n)
>>>>>>>
>>>>>>> # Finally, what can I do to avoid copying mydict to each processor?
>>>>>>> # Is there a way to use shared memory for it?
>>>>>>> # Once again, I'd really like to be able to define my own shared
>>>>>>> memory data types...
>>>>>>>
>>>>>>
>>>>>>
>>>>
>


-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Reply via email to