Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Amit Murthy Tue, 21 Jan 2014 23:17:59 -0800

I have not gone through your post in detail, but would like to point out
that SharedArray can only be used for bitstypes.



On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell <[email protected]
> wrote:

> # Say I have a list of tasks, eg tasks i=1:n
> # For each task I want to call a function foo
> # that depends on that task and some fixed data
> # I have many types of fixed data: eg, arrays, dictionaries, integers, etc
>
> # Imagine the data comes from eg loading a file based on user input,
> # so we can't hard code the data into the function foo
> # although it's constant during program execution
>
> # If I were doing this in serial, I'd do the following
>
> type MyData
> myint
> mydict
> myarray
> end
>
> function foo(task,data::MyData)
> data.myint + data.myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
>
> results = zeros(n)
> for i = 1:n
> results[i] = foo(i,data)
> end
>
> # What's the right way to do this in parallel? Here are a number of ideas
> # To use @parallel or pmap, we have to first copy all the code and data
> everywhere
> # I'd like to avoid that, since the data is huge (10 - 100 GB)
>
> @everywhere begin
> type MyData
> myint
> mydict
> myarray
> end
>
> function foo(task,data::MyData)
> data.myint + data.myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n))
> end
>
> ## @parallel
> results = zeros(n)
> @parallel for i = 1:n
> results[i] = foo(i,data)
> end
>
> ## pmap
> @everywhere foo(task) = foo(task,data)
> results = pmap(foo,1:n)
>
> # To avoid copying data, I can make myarray a shared array
> # In that case, I don't want to use @everywhere to put data on each
> processor
> # since that would reinstantiate the shared array.
> # My current solution is to rewrite my data structure to *not* include
> myarray,
> # and pass the array to the function foo separately.
> # But the code gets much less pretty as I tear apart my data structure,
> # especially if I have a large number of shared arrays.
> # Is there a way for me to avoid this while using shared memory?
> # really, I'd like to be able to define my own shared memory data types...
>
> @everywhere begin
> type MySmallerData
> myint
> mydict
> end
>
> function foo(task,data::MySmallerData,myarray::SharedArray)
> data.myint + myarray[data.mydict[task]]
> end
>
> n = 10
> const data = MySmallerData(rand(),Dict(1:n,randperm(n)))
> end
>
> myarray = SharedArray(randperm(n))
>
> ## @parallel
> results = zeros(n)
> @parallel for i = 1:n
> results[i] = foo(i,data,myarray)
> end
>
> ## pmap
> @everywhere foo(task) = foo(task,data,myarray)
> results = pmap(foo,1:n)
>
> # Finally, what can I do to avoid copying mydict to each processor?
> # Is there a way to use shared memory for it?
> # Once again, I'd really like to be able to define my own shared memory
> data types...
>

Re: [julia-users] [Parallel] Using shared memory + parallel maps elegantly

Reply via email to