The orte routed framework does that for you - there is an API for that purpose.
> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> wrote: > > Important detail first: I get this message from significantly modified Open > MPI code, so problem exists solely due to my mistake. > > Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than > orted figures out it has nothing to do. > If I request to start workers on the same 192.168.122.90 IP, the mpi_hello is > started. > > Partial log: > /usr/bin/mpirun -np 1 ... mpi_hello > # > [osv:00252] [[50738,0],0] plm:base:setup_job > [osv:00252] [[50738,0],0] plm:base:setup_vm > [osv:00252] [[50738,0],0] plm:base:setup_vm creating map > [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation > [osv:00252] [[50738,0],0] using dash_host > [osv:00252] [[50738,0],0] checking node 192.168.122.91 > [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1] > [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon > [[50738,0],1] to node 192.168.122.91 > [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2 > [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 > [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2 > [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 > [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1 > [osv:00252] [[50738,0],0] routed:binomial find children computing tree > [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2 > [osv:00252] [[50738,0],0] routed:binomial find children returning found value > 0 > [osv:00252] [[50738,0],0]: parent 0 num_children 1 > [osv:00252] [[50738,0],0]: child 1 > [osv:00252] [[50738,0],0] plm:osvrest: launching vm > # > [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called > [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2 > [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 > [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1 > [osv:00250] [[50738,0],1] routed:binomial find children computing tree > [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2 > [osv:00250] [[50738,0],1] routed:binomial find children returning found value > 0 > [osv:00250] [[50738,0],1]: parent 0 num_children 0 > [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children! > > In the plm mca module remote_spawn() function (my plm is based on > orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, > which module(s) are responsible for filling in the coll.targets? Then I will > turn on the correct mca xzy_base_verbose level, and hopefully narrow down my > problem. I have quite a problem guessing/finding out what various xyz strings > mean :) > > Thank you, Justin > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel