The orte routed framework does that for you - there is an API for that purpose.


> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> wrote:
> 
> Important detail first: I get this message from significantly modified Open 
> MPI code, so problem exists solely due to my mistake.
> 
> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than 
> orted figures out it has nothing to do.
> If I request to start workers on the same 192.168.122.90 IP, the mpi_hello is 
> started.
> 
> Partial log:
> /usr/bin/mpirun -np 1 ... mpi_hello
> #
> [osv:00252] [[50738,0],0] plm:base:setup_job
> [osv:00252] [[50738,0],0] plm:base:setup_vm
> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map
> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation
> [osv:00252] [[50738,0],0] using dash_host
> [osv:00252] [[50738,0],0] checking node 192.168.122.91
> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1]
> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon 
> [[50738,0],1] to node 192.168.122.91
> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2
> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1
> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2
> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0
> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1
> [osv:00252] [[50738,0],0] routed:binomial find children computing tree
> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2
> [osv:00252] [[50738,0],0] routed:binomial find children returning found value > 0
> [osv:00252] [[50738,0],0]: parent 0 num_children 1
> [osv:00252] [[50738,0],0]:      child 1
> [osv:00252] [[50738,0],0] plm:osvrest: launching vm
> #
> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called
> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2
> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0
> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1
> [osv:00250] [[50738,0],1] routed:binomial find children computing tree
> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2
> [osv:00250] [[50738,0],1] routed:binomial find children returning found value > 0
> [osv:00250] [[50738,0],1]: parent 0 num_children 0
> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children!
> 
> In the plm mca module remote_spawn() function (my plm is based on 
> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, 
> which module(s) are responsible for filling in the coll.targets? Then I will 
> turn on the correct mca xzy_base_verbose level, and hopefully narrow down my 
> problem. I have quite a problem guessing/finding out what various xyz strings 
> mean :)
> 
> Thank you, Justin
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to