>From the documentation - "Modules in Julia are separate global variable
workspaces."
So what is happening is that the anonymous function in "remotecall(i,
x->(global const X=x; nothing), localX)" creates X as module global.
The following works:
module ParallelStuff
export doparallelstuff
function doparallelstuff()(m = 10, n = 20)
# initialize variables
localX = Base.shmem_rand(m; pids=procs())
localY = Base.shmem_rand(n; pids=procs())
localf = [x->i+sum(x) for i=1:m]
localg = [x->i+sum(x) for i=1:n]
# broadcast variables to all worker processes (thanks to Amit Murthy
for suggesting this syntax)
@sync begin
for i in procs(localX)
remotecall(i, x->(global X=x; nothing), localX)
remotecall(i, x->(global Y=x; nothing), localY)
remotecall(i, x->(global f=x; nothing), localf)
remotecall(i, x->(global g=x; nothing), localg)
end
end
# compute
for iteration=1:1
@everywhere begin
X=ParallelStuff.X
Y=ParallelStuff.Y
f=ParallelStuff.f
g=ParallelStuff.g
for i=localindexes(X)
X[i] = f[i](Y)
end
for j=localindexes(Y)
Y[j] = g[j](X)
end
end
end
end
end #module
While remotecall, @everywhere, etc run under Main, the fact that the
closure variables refers to Module ParallelStuff is pretty confusing.....
I think we need a better way to handle this.
On Tue, Dec 2, 2014 at 4:58 AM, Madeleine Udell <[email protected]>
wrote:
> Thanks to Blake and Amit for some excellent suggestions! Both strategies
> work fine when embedded in functions, but not when those functions are
> embedded in modules. For example, the following throws an error:
>
> @everywhere include("ParallelStuff.jl")
> @everywhere using ParallelStuff
> doparallelstuff()
>
> when ParallelStuff.jl contains the following code:
>
> module ParallelStuff
> export doparallelstuff
>
> function doparallelstuff(m = 10, n = 20)
> # initialize variables
> localX = Base.shmem_rand(m; pids=procs())
> localY = Base.shmem_rand(n; pids=procs())
> localf = [x->i+sum(x) for i=1:m]
> localg = [x->i+sum(x) for i=1:n]
>
> # broadcast variables to all worker processes (thanks to Amit Murthy
> for suggesting this syntax)
> @sync begin
> for i in procs(localX)
> remotecall(i, x->(global const X=x; nothing), localX)
> remotecall(i, x->(global const Y=x; nothing), localY)
> remotecall(i, x->(global const f=x; nothing), localf)
> remotecall(i, x->(global const g=x; nothing), localg)
> end
> end
>
> # compute
> for iteration=1:1
> @everywhere for i=localindexes(X)
> X[i] = f[i](Y)
> end
> @everywhere for j=localindexes(Y)
> Y[j] = g[j](X)
> end
> end
> end
>
> end #module
>
> On 3 processes (julia -p 3), the error is as follows:
>
> exception on 1: exception on 2: exception on 3: ERROR: X not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
> ERROR: X not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in anonymous at multi.jl:848
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
> ERROR: X not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in anonymous at multi.jl:848
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
> exception on exception on 2: 1: ERROR: Y not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in anonymous at multi.jl:848
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
> ERROR: Y not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
> exception on 3: ERROR: Y not defined
> in anonymous at no file
> in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
> in anonymous at multi.jl:1310
> in anonymous at multi.jl:848
> in run_work_thunk at multi.jl:621
> in run_work_thunk at multi.jl:630
> in anonymous at task.jl:6
>
> For comparison, the non-modularized version works:
>
> function doparallelstuff(m = 10, n = 20)
> # initialize variables
> localX = Base.shmem_rand(m; pids=procs())
> localY = Base.shmem_rand(n; pids=procs())
> localf = [x->i+sum(x) for i=1:m]
> localg = [x->i+sum(x) for i=1:n]
>
> # broadcast variables to all worker processes (thanks to Amit Murthy
> for suggesting this syntax)
> @sync begin
> for i in procs(localX)
> remotecall(i, x->(global const X=x; nothing), localX)
> remotecall(i, x->(global const Y=x; nothing), localY)
> remotecall(i, x->(global const f=x; nothing), localf)
> remotecall(i, x->(global const g=x; nothing), localg)
> end
> end
>
> # compute
> for iteration=1:1
> @everywhere for i=localindexes(X)
> X[i] = f[i](Y)
> end
> @everywhere for j=localindexes(Y)
> Y[j] = g[j](X)
> end
> end
> end
>
> doparallelstuff()
>
> On Mon, Nov 24, 2014 at 11:24 AM, Blake Johnson <[email protected]>
> wrote:
>
>> I use this macro to send variables to remote processes:
>>
>> macro sendvar(proc, x)
>> quote
>> rr = RemoteRef()
>> put!(rr, $x)
>> remotecall($proc, (rr)->begin
>> global $(esc(x))
>> $(esc(x)) = fetch(rr)
>> end, rr)
>> end
>> end
>>
>> Though the solution above looks a little simpler.
>>
>> --Blake
>>
>> On Sunday, November 23, 2014 1:30:49 AM UTC-5, Amit Murthy wrote:
>>>
>>> From the description of Base.localize_vars - 'wrap an expression in "let
>>> a=a,b=b,..." for each var it references'
>>>
>>> Though that does not seem to the only(?) issue here....
>>>
>>> On Sun, Nov 23, 2014 at 11:52 AM, Madeleine Udell <[email protected]>
>>> wrote:
>>>
>>>> Thanks! This is extremely helpful.
>>>>
>>>> Can you tell me more about what localize_vars does?
>>>>
>>>> On Sat, Nov 22, 2014 at 9:11 PM, Amit Murthy <[email protected]>
>>>> wrote:
>>>>
>>>>> This works:
>>>>>
>>>>> function doparallelstuff(m = 10, n = 20)
>>>>> # initialize variables
>>>>> localX = Base.shmem_rand(m; pids=procs())
>>>>> localY = Base.shmem_rand(n; pids=procs())
>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>
>>>>> # broadcast variables to all worker processes
>>>>> @sync begin
>>>>> for i in procs(localX)
>>>>> remotecall(i, x->(global X; X=x; nothing), localX)
>>>>> remotecall(i, x->(global Y; Y=x; nothing), localY)
>>>>> remotecall(i, x->(global f; f=x; nothing), localf)
>>>>> remotecall(i, x->(global g; g=x; nothing), localg)
>>>>> end
>>>>> end
>>>>>
>>>>> # compute
>>>>> for iteration=1:1
>>>>> @everywhere for i=localindexes(X)
>>>>> X[i] = f[i](Y)
>>>>> end
>>>>> @everywhere for j=localindexes(Y)
>>>>> Y[j] = g[j](X)
>>>>> end
>>>>> end
>>>>> end
>>>>>
>>>>> doparallelstuff()
>>>>>
>>>>> Though I would have expected broadcast of variables to be possible
>>>>> with just
>>>>> @everywhere X=localX
>>>>> and so on ....
>>>>>
>>>>>
>>>>> Looks like @everywhere does not call localize_vars. I don't know if
>>>>> this is by design or just an oversight. I would have expected it to do so.
>>>>> Will file an issue on github.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Nov 23, 2014 at 8:24 AM, Madeleine Udell <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> The code block I posted before works, but throws an error when
>>>>>> embedded in a function: "ERROR: X not defined" (in first line of
>>>>>> @parallel). Why am I getting this error when I'm *assigning to* X?
>>>>>>
>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>> # initialize variables
>>>>>> localX = Base.shmem_rand(m)
>>>>>> localY = Base.shmem_rand(n)
>>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>>
>>>>>> # broadcast variables to all worker processes
>>>>>> @parallel for i=workers()
>>>>>> global X = localX
>>>>>> global Y = localY
>>>>>> global f = localf
>>>>>> global g = localg
>>>>>> end
>>>>>> # give variables same name on master
>>>>>> X,Y,f,g = localX,localY,localf,localg
>>>>>>
>>>>>> # compute
>>>>>> for iteration=1:1
>>>>>> @everywhere for i=localindexes(X)
>>>>>> X[i] = f[i](Y)
>>>>>> end
>>>>>> @everywhere for j=localindexes(Y)
>>>>>> Y[j] = g[j](X)
>>>>>> end
>>>>>> end
>>>>>> end
>>>>>>
>>>>>> doparallelstuff()
>>>>>>
>>>>>> On Fri, Nov 21, 2014 at 5:13 PM, Madeleine Udell <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> My experiments with parallelism also occur in focused blocks; I
>>>>>>> think that's a sign that it's not yet as user friendly as it could be.
>>>>>>>
>>>>>>> Here's a solution to the problem I posed that's simple to use:
>>>>>>> @parallel + global can be used to broadcast a variable, while
>>>>>>> @everywhere
>>>>>>> can be used to do a computation on local data (ie, without resending the
>>>>>>> data). I'm not sure how to do the variable renaming programmatically,
>>>>>>> though.
>>>>>>>
>>>>>>> # initialize variables
>>>>>>> m,n = 10,20
>>>>>>> localX = Base.shmem_rand(m)
>>>>>>> localY = Base.shmem_rand(n)
>>>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>>>
>>>>>>> # broadcast variables to all worker processes
>>>>>>> @parallel for i=workers()
>>>>>>> global X = localX
>>>>>>> global Y = localY
>>>>>>> global f = localf
>>>>>>> global g = localg
>>>>>>> end
>>>>>>> # give variables same name on master
>>>>>>> X,Y,f,g = localX,localY,localf,localg
>>>>>>>
>>>>>>> # compute
>>>>>>> for iteration=1:10
>>>>>>> @everywhere for i=localindexes(X)
>>>>>>> X[i] = f[i](Y)
>>>>>>> end
>>>>>>> @everywhere for j=localindexes(Y)
>>>>>>> Y[j] = g[j](X)
>>>>>>> end
>>>>>>> end
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 11:14 AM, Tim Holy <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My experiments with parallelism tend to occur in focused blocks,
>>>>>>>> and I haven't
>>>>>>>> done it in quite a while. So I doubt I can help much. But in
>>>>>>>> general I suspect
>>>>>>>> you're encountering these problems because much of the IPC goes
>>>>>>>> through
>>>>>>>> thunks, and so a lot of stuff gets reclaimed when execution is done.
>>>>>>>>
>>>>>>>> If I were experimenting, I'd start by trying to create RemoteRef()s
>>>>>>>> and put!
>>>>>>>> ()ing my variables into them. Then perhaps you might be able to
>>>>>>>> fetch them
>>>>>>>> from other processes. Not sure that will work, but it seems to be
>>>>>>>> worth a try.
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>> --Tim
>>>>>>>>
>>>>>>>> On Thursday, November 20, 2014 08:20:19 PM Madeleine Udell wrote:
>>>>>>>> > I'm trying to use parallelism in julia for a task with a
>>>>>>>> structure that I
>>>>>>>> > think is quite pervasive. It looks like this:
>>>>>>>> >
>>>>>>>> > # broadcast lists of functions f and g to all processes so they're
>>>>>>>> > available everywhere
>>>>>>>> > # create shared arrays X,Y on all processes so they're available
>>>>>>>> everywhere
>>>>>>>> > for iteration=1:1000
>>>>>>>> > @parallel for i=1:size(X)
>>>>>>>> > X[i] = f[i](Y)
>>>>>>>> > end
>>>>>>>> > @parallel for j=1:size(Y)
>>>>>>>> > Y[j] = g[j](X)
>>>>>>>> > end
>>>>>>>> > end
>>>>>>>> >
>>>>>>>> > I'm having trouble making this work, and I'm not sure where to
>>>>>>>> dig around
>>>>>>>> > to find a solution. Here are the difficulties I've encountered:
>>>>>>>> >
>>>>>>>> > * @parallel doesn't allow me to create persistent variables on
>>>>>>>> each
>>>>>>>> > process; ie, the following results in an error.
>>>>>>>> >
>>>>>>>> > s = Base.shmem_rand(12,3)
>>>>>>>> > @parallel for i=1:nprocs() m,n = size(s) end
>>>>>>>> > @parallel for i=1:nprocs() println(m) end
>>>>>>>> >
>>>>>>>> > * @everywhere does allow me to create persistent variables on
>>>>>>>> each process,
>>>>>>>> > but doesn't send any data at all, including the variables I need
>>>>>>>> in order
>>>>>>>> > to define new variables. Eg the following is an error: s is a
>>>>>>>> shared array,
>>>>>>>> > but the variable (ie pointer to) s is apparently not shared.
>>>>>>>> > s = Base.shmem_rand(12,3)
>>>>>>>> > @everywhere m,n = size(s)
>>>>>>>> >
>>>>>>>> > Here are the kinds of questions I'd like to see protocode for:
>>>>>>>> > * How can I broadcast a variable so that it is available and
>>>>>>>> persistent on
>>>>>>>> > every process?
>>>>>>>> > * How can I create a reference to the same shared array "s" that
>>>>>>>> is
>>>>>>>> > accessible from every process?
>>>>>>>> > * How can I send a command to be performed in parallel,
>>>>>>>> specifying which
>>>>>>>> > variables should be sent to the relevant processes and which
>>>>>>>> should be
>>>>>>>> > looked up in the local namespace?
>>>>>>>> >
>>>>>>>> > Note that everything I ask above is not specific to shared
>>>>>>>> arrays; the same
>>>>>>>> > constructs would also be extremely useful in the distributed case.
>>>>>>>> >
>>>>>>>> > ----------------------
>>>>>>>> >
>>>>>>>> > An interesting partial solution is the following:
>>>>>>>> > funcs! = Function[x->x[:] = x+k for k=1:3]
>>>>>>>> > d = drand(3,12)
>>>>>>>> > let funcs! = funcs!
>>>>>>>> > @sync @parallel for k in 1:3
>>>>>>>> > funcs)
>>>>>>>> > end
>>>>>>>> > end
>>>>>>>> >
>>>>>>>> > Here, I'm not sure why the let statement is necessary to send
>>>>>>>> funcs!, since
>>>>>>>> > d is sent automatically.
>>>>>>>> >
>>>>>>>> > ---------------------
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> > Madeleine
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Madeleine Udell
>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>> Stanford University
>>>>>>> www.stanford.edu/~udell
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Madeleine Udell
>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>> Stanford University
>>>>>> www.stanford.edu/~udell
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Madeleine Udell
>>>> PhD Candidate in Computational and Mathematical Engineering
>>>> Stanford University
>>>> www.stanford.edu/~udell
>>>>
>>>
>>>
>
>
> --
> Madeleine Udell
> PhD Candidate in Computational and Mathematical Engineering
> Stanford University
> www.stanford.edu/~udell
>