I have not gone through your post in detail, but would like to point out that SharedArray can only be used for bitstypes.
On Wed, Jan 22, 2014 at 12:23 PM, Madeleine Udell <[email protected] > wrote: > # Say I have a list of tasks, eg tasks i=1:n > # For each task I want to call a function foo > # that depends on that task and some fixed data > # I have many types of fixed data: eg, arrays, dictionaries, integers, etc > > # Imagine the data comes from eg loading a file based on user input, > # so we can't hard code the data into the function foo > # although it's constant during program execution > > # If I were doing this in serial, I'd do the following > > type MyData > myint > mydict > myarray > end > > function foo(task,data::MyData) > data.myint + data.myarray[data.mydict[task]] > end > > n = 10 > const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n)) > > results = zeros(n) > for i = 1:n > results[i] = foo(i,data) > end > > # What's the right way to do this in parallel? Here are a number of ideas > # To use @parallel or pmap, we have to first copy all the code and data > everywhere > # I'd like to avoid that, since the data is huge (10 - 100 GB) > > @everywhere begin > type MyData > myint > mydict > myarray > end > > function foo(task,data::MyData) > data.myint + data.myarray[data.mydict[task]] > end > > n = 10 > const data = MyData(rand(),Dict(1:n,randperm(n)),randperm(n)) > end > > ## @parallel > results = zeros(n) > @parallel for i = 1:n > results[i] = foo(i,data) > end > > ## pmap > @everywhere foo(task) = foo(task,data) > results = pmap(foo,1:n) > > # To avoid copying data, I can make myarray a shared array > # In that case, I don't want to use @everywhere to put data on each > processor > # since that would reinstantiate the shared array. > # My current solution is to rewrite my data structure to *not* include > myarray, > # and pass the array to the function foo separately. > # But the code gets much less pretty as I tear apart my data structure, > # especially if I have a large number of shared arrays. > # Is there a way for me to avoid this while using shared memory? > # really, I'd like to be able to define my own shared memory data types... > > @everywhere begin > type MySmallerData > myint > mydict > end > > function foo(task,data::MySmallerData,myarray::SharedArray) > data.myint + myarray[data.mydict[task]] > end > > n = 10 > const data = MySmallerData(rand(),Dict(1:n,randperm(n))) > end > > myarray = SharedArray(randperm(n)) > > ## @parallel > results = zeros(n) > @parallel for i = 1:n > results[i] = foo(i,data,myarray) > end > > ## pmap > @everywhere foo(task) = foo(task,data,myarray) > results = pmap(foo,1:n) > > # Finally, what can I do to avoid copying mydict to each processor? > # Is there a way to use shared memory for it? > # Once again, I'd really like to be able to define my own shared memory > data types... >
