Re: [julia-users] Broadcasting variables

Amit Murthy Mon, 01 Dec 2014 20:35:09 -0800

>From the documentation - "Modules in Julia are separate global variable
workspaces."


So what is happening is that the anonymous function in "remotecall(i,
x->(global const X=x; nothing), localX)" creates X as module global.

The following works:

module ParallelStuff
export doparallelstuff

function doparallelstuff()(m = 10, n = 20)
    # initialize variables
    localX = Base.shmem_rand(m; pids=procs())
    localY = Base.shmem_rand(n; pids=procs())
    localf = [x->i+sum(x) for i=1:m]
    localg = [x->i+sum(x) for i=1:n]

    # broadcast variables to all worker processes (thanks to Amit Murthy
for suggesting this syntax)
    @sync begin
        for i in procs(localX)
            remotecall(i, x->(global X=x; nothing), localX)
            remotecall(i, x->(global Y=x; nothing), localY)
            remotecall(i, x->(global f=x; nothing), localf)
            remotecall(i, x->(global g=x; nothing), localg)
        end
    end

    # compute
    for iteration=1:1
        @everywhere begin
            X=ParallelStuff.X
            Y=ParallelStuff.Y
            f=ParallelStuff.f
            g=ParallelStuff.g
            for i=localindexes(X)
                X[i] = f[i](Y)
            end
            for j=localindexes(Y)
                Y[j] = g[j](X)
            end
        end
    end
end

end #module


While remotecall, @everywhere, etc run under Main, the fact that the
closure variables refers to Module ParallelStuff is pretty confusing.....
I think we need a better way to handle this.


On Tue, Dec 2, 2014 at 4:58 AM, Madeleine Udell <[email protected]>
wrote:

> Thanks to Blake and Amit for some excellent suggestions! Both strategies
> work fine when embedded in functions, but not when those functions are
> embedded in modules. For example, the following throws an error:
>
> @everywhere include("ParallelStuff.jl")
> @everywhere using ParallelStuff
> doparallelstuff()
>
> when ParallelStuff.jl contains the following code:
>
> module ParallelStuff
> export doparallelstuff
>
> function doparallelstuff(m = 10, n = 20)
>     # initialize variables
>     localX = Base.shmem_rand(m; pids=procs())
>     localY = Base.shmem_rand(n; pids=procs())
>     localf = [x->i+sum(x) for i=1:m]
>     localg = [x->i+sum(x) for i=1:n]
>
>     # broadcast variables to all worker processes (thanks to Amit Murthy
> for suggesting this syntax)
>     @sync begin
>         for i in procs(localX)
>             remotecall(i, x->(global const X=x; nothing), localX)
>             remotecall(i, x->(global const Y=x; nothing), localY)
>             remotecall(i, x->(global const f=x; nothing), localf)
>             remotecall(i, x->(global const g=x; nothing), localg)
>         end
>     end
>
>     # compute
>     for iteration=1:1
>         @everywhere for i=localindexes(X)
>             X[i] = f[i](Y)
>         end
>         @everywhere for j=localindexes(Y)
>             Y[j] = g[j](X)
>         end
>     end
> end
>
> end #module
>
> On 3 processes (julia -p 3), the error is as follows:
>
> exception on 1: exception on 2: exception on 3: ERROR: X not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
> ERROR: X not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in anonymous at multi.jl:848
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
> ERROR: X not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in anonymous at multi.jl:848
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
> exception on exception on 2: 1: ERROR: Y not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in anonymous at multi.jl:848
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
> ERROR: Y not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
> exception on 3: ERROR: Y not defined
>  in anonymous at no file
>  in eval at
> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>  in anonymous at multi.jl:1310
>  in anonymous at multi.jl:848
>  in run_work_thunk at multi.jl:621
>  in run_work_thunk at multi.jl:630
>  in anonymous at task.jl:6
>
> For comparison, the non-modularized version works:
>
> function doparallelstuff(m = 10, n = 20)
>     # initialize variables
>     localX = Base.shmem_rand(m; pids=procs())
>     localY = Base.shmem_rand(n; pids=procs())
>     localf = [x->i+sum(x) for i=1:m]
>     localg = [x->i+sum(x) for i=1:n]
>
>     # broadcast variables to all worker processes (thanks to Amit Murthy
> for suggesting this syntax)
>     @sync begin
>         for i in procs(localX)
>             remotecall(i, x->(global const X=x; nothing), localX)
>             remotecall(i, x->(global const Y=x; nothing), localY)
>             remotecall(i, x->(global const f=x; nothing), localf)
>             remotecall(i, x->(global const g=x; nothing), localg)
>         end
>     end
>
>     # compute
>     for iteration=1:1
>         @everywhere for i=localindexes(X)
>             X[i] = f[i](Y)
>         end
>         @everywhere for j=localindexes(Y)
>             Y[j] = g[j](X)
>         end
>     end
> end
>
> doparallelstuff()
>
> On Mon, Nov 24, 2014 at 11:24 AM, Blake Johnson <[email protected]>
> wrote:
>
>> I use this macro to send variables to remote processes:
>>
>> macro sendvar(proc, x)
>>     quote
>>         rr = RemoteRef()
>>         put!(rr, $x)
>>         remotecall($proc, (rr)->begin
>>             global $(esc(x))
>>             $(esc(x)) = fetch(rr)
>>         end, rr)
>>     end
>> end
>>
>> Though the solution above looks a little simpler.
>>
>> --Blake
>>
>> On Sunday, November 23, 2014 1:30:49 AM UTC-5, Amit Murthy wrote:
>>>
>>> From the description of Base.localize_vars - 'wrap an expression in "let
>>> a=a,b=b,..." for each var it references'
>>>
>>> Though that does not seem to the only(?) issue here....
>>>
>>> On Sun, Nov 23, 2014 at 11:52 AM, Madeleine Udell <[email protected]>
>>> wrote:
>>>
>>>> Thanks! This is extremely helpful.
>>>>
>>>> Can you tell me more about what localize_vars does?
>>>>
>>>> On Sat, Nov 22, 2014 at 9:11 PM, Amit Murthy <[email protected]>
>>>> wrote:
>>>>
>>>>> This works:
>>>>>
>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>     # initialize variables
>>>>>     localX = Base.shmem_rand(m; pids=procs())
>>>>>     localY = Base.shmem_rand(n; pids=procs())
>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>
>>>>>     # broadcast variables to all worker processes
>>>>>     @sync begin
>>>>>         for i in procs(localX)
>>>>>             remotecall(i, x->(global X; X=x; nothing), localX)
>>>>>             remotecall(i, x->(global Y; Y=x; nothing), localY)
>>>>>             remotecall(i, x->(global f; f=x; nothing), localf)
>>>>>             remotecall(i, x->(global g; g=x; nothing), localg)
>>>>>         end
>>>>>     end
>>>>>
>>>>>     # compute
>>>>>     for iteration=1:1
>>>>>         @everywhere for i=localindexes(X)
>>>>>             X[i] = f[i](Y)
>>>>>         end
>>>>>         @everywhere for j=localindexes(Y)
>>>>>             Y[j] = g[j](X)
>>>>>         end
>>>>>     end
>>>>> end
>>>>>
>>>>> doparallelstuff()
>>>>>
>>>>> Though I would have expected broadcast of variables to be possible
>>>>> with just
>>>>> @everywhere X=localX
>>>>> and so on ....
>>>>>
>>>>>
>>>>> Looks like @everywhere does not call localize_vars.  I don't know if
>>>>> this is by design or just an oversight. I would have expected it to do so.
>>>>> Will file an issue on github.
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Nov 23, 2014 at 8:24 AM, Madeleine Udell <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> The code block I posted before works, but throws an error when
>>>>>> embedded in a function: "ERROR: X not defined" (in first line of
>>>>>> @parallel). Why am I getting this error when I'm *assigning to* X?
>>>>>>
>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>>     # initialize variables
>>>>>>     localX = Base.shmem_rand(m)
>>>>>>     localY = Base.shmem_rand(n)
>>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>>
>>>>>>     # broadcast variables to all worker processes
>>>>>>     @parallel for i=workers()
>>>>>>         global X = localX
>>>>>>         global Y = localY
>>>>>>         global f = localf
>>>>>>         global g = localg
>>>>>>     end
>>>>>>     # give variables same name on master
>>>>>>     X,Y,f,g = localX,localY,localf,localg
>>>>>>
>>>>>>     # compute
>>>>>>     for iteration=1:1
>>>>>>         @everywhere for i=localindexes(X)
>>>>>>             X[i] = f[i](Y)
>>>>>>         end
>>>>>>         @everywhere for j=localindexes(Y)
>>>>>>             Y[j] = g[j](X)
>>>>>>         end
>>>>>>     end
>>>>>> end
>>>>>>
>>>>>> doparallelstuff()
>>>>>>
>>>>>> On Fri, Nov 21, 2014 at 5:13 PM, Madeleine Udell <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> My experiments with parallelism also occur in focused blocks; I
>>>>>>> think that's a sign that it's not yet as user friendly as it could be.
>>>>>>>
>>>>>>> Here's a solution to the problem I posed that's simple to use:
>>>>>>> @parallel + global can be used to broadcast a variable, while 
>>>>>>> @everywhere
>>>>>>> can be used to do a computation on local data (ie, without resending the
>>>>>>> data). I'm not sure how to do the variable renaming programmatically,
>>>>>>> though.
>>>>>>>
>>>>>>> # initialize variables
>>>>>>> m,n = 10,20
>>>>>>> localX = Base.shmem_rand(m)
>>>>>>> localY = Base.shmem_rand(n)
>>>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>>>
>>>>>>> # broadcast variables to all worker processes
>>>>>>> @parallel for i=workers()
>>>>>>>     global X = localX
>>>>>>>     global Y = localY
>>>>>>>     global f = localf
>>>>>>>     global g = localg
>>>>>>> end
>>>>>>> # give variables same name on master
>>>>>>> X,Y,f,g = localX,localY,localf,localg
>>>>>>>
>>>>>>> # compute
>>>>>>> for iteration=1:10
>>>>>>>     @everywhere for i=localindexes(X)
>>>>>>>         X[i] = f[i](Y)
>>>>>>>     end
>>>>>>>     @everywhere for j=localindexes(Y)
>>>>>>>         Y[j] = g[j](X)
>>>>>>>     end
>>>>>>> end
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 11:14 AM, Tim Holy <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My experiments with parallelism tend to occur in focused blocks,
>>>>>>>> and I haven't
>>>>>>>> done it in quite a while. So I doubt I can help much. But in
>>>>>>>> general I suspect
>>>>>>>> you're encountering these problems because much of the IPC goes
>>>>>>>> through
>>>>>>>> thunks, and so a lot of stuff gets reclaimed when execution is done.
>>>>>>>>
>>>>>>>> If I were experimenting, I'd start by trying to create RemoteRef()s
>>>>>>>> and put!
>>>>>>>> ()ing my variables into them. Then perhaps you might be able to
>>>>>>>> fetch them
>>>>>>>> from other processes. Not sure that will work, but it seems to be
>>>>>>>> worth a try.
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>> --Tim
>>>>>>>>
>>>>>>>> On Thursday, November 20, 2014 08:20:19 PM Madeleine Udell wrote:
>>>>>>>> > I'm trying to use parallelism in julia for a task with a
>>>>>>>> structure that I
>>>>>>>> > think is quite pervasive. It looks like this:
>>>>>>>> >
>>>>>>>> > # broadcast lists of functions f and g to all processes so they're
>>>>>>>> > available everywhere
>>>>>>>> > # create shared arrays X,Y on all processes so they're available
>>>>>>>> everywhere
>>>>>>>> > for iteration=1:1000
>>>>>>>> > @parallel for i=1:size(X)
>>>>>>>> > X[i] = f[i](Y)
>>>>>>>> > end
>>>>>>>> > @parallel for j=1:size(Y)
>>>>>>>> > Y[j] = g[j](X)
>>>>>>>> > end
>>>>>>>> > end
>>>>>>>> >
>>>>>>>> > I'm having trouble making this work, and I'm not sure where to
>>>>>>>> dig around
>>>>>>>> > to find a solution. Here are the difficulties I've encountered:
>>>>>>>> >
>>>>>>>> > * @parallel doesn't allow me to create persistent variables on
>>>>>>>> each
>>>>>>>> > process; ie, the following results in an error.
>>>>>>>> >
>>>>>>>> >         s = Base.shmem_rand(12,3)
>>>>>>>> > @parallel for i=1:nprocs() m,n = size(s) end
>>>>>>>> > @parallel for i=1:nprocs() println(m) end
>>>>>>>> >
>>>>>>>> > * @everywhere does allow me to create persistent variables on
>>>>>>>> each process,
>>>>>>>> > but doesn't send any data at all, including the variables I need
>>>>>>>> in order
>>>>>>>> > to define new variables. Eg the following is an error: s is a
>>>>>>>> shared array,
>>>>>>>> > but the variable (ie pointer to) s is apparently not shared.
>>>>>>>> > s = Base.shmem_rand(12,3)
>>>>>>>> > @everywhere m,n = size(s)
>>>>>>>> >
>>>>>>>> > Here are the kinds of questions I'd like to see protocode for:
>>>>>>>> > * How can I broadcast a variable so that it is available and
>>>>>>>> persistent on
>>>>>>>> > every process?
>>>>>>>> > * How can I create a reference to the same shared array "s" that
>>>>>>>> is
>>>>>>>> > accessible from every process?
>>>>>>>> > * How can I send a command to be performed in parallel,
>>>>>>>> specifying which
>>>>>>>> > variables should be sent to the relevant processes and which
>>>>>>>> should be
>>>>>>>> > looked up in the local namespace?
>>>>>>>> >
>>>>>>>> > Note that everything I ask above is not specific to shared
>>>>>>>> arrays; the same
>>>>>>>> > constructs would also be extremely useful in the distributed case.
>>>>>>>> >
>>>>>>>> > ----------------------
>>>>>>>> >
>>>>>>>> > An interesting partial solution is the following:
>>>>>>>> > funcs! = Function[x->x[:] = x+k for k=1:3]
>>>>>>>> > d = drand(3,12)
>>>>>>>> > let funcs! = funcs!
>>>>>>>> >   @sync @parallel for k in 1:3
>>>>>>>> >     funcs![myid()-1](localpart(d))
>>>>>>>> >   end
>>>>>>>> > end
>>>>>>>> >
>>>>>>>> > Here, I'm not sure why the let statement is necessary to send
>>>>>>>> funcs!, since
>>>>>>>> > d is sent automatically.
>>>>>>>> >
>>>>>>>> > ---------------------
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> > Madeleine
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Madeleine Udell
>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>> Stanford University
>>>>>>> www.stanford.edu/~udell
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Madeleine Udell
>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>> Stanford University
>>>>>> www.stanford.edu/~udell
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Madeleine Udell
>>>> PhD Candidate in Computational and Mathematical Engineering
>>>> Stanford University
>>>> www.stanford.edu/~udell
>>>>
>>>
>>>
>
>
> --
> Madeleine Udell
> PhD Candidate in Computational and Mathematical Engineering
> Stanford University
> www.stanford.edu/~udell
>

Re: [julia-users] Broadcasting variables

Reply via email to