Are you talking about an MPI communication? If so, then you need to update every proc's modex info for the proc that moved - this is something stored in each MPI proc's memory, so it isn't something that you can just get from the daemon on-demand. You'll have to provide the update to every single proc directly so that it has the info if/when it should decide to send an MPI message to the proc that moved.
This is why we do a modex upon restart - sending the change to every MPI proc is hardly scalable minus a collective operation. See the modex database interface in orte/mca/grpcomm/base/grpcomm_base_modex.c. You'll have to create new code to send/recv an update message, but the code to update the database entry exists. On Jun 2, 2011, at 7:52 AM, Hugo Meyer wrote: > Hello again. > > My actual problem is that i don't know where is the struct that has the > information that is used to send messages to the procs. > > Something like: > > Rank URI > 0 21222:tcp:192.168.1.1:1250 > 1 21223:tcp:192.168.1.2:1250 > ..... ..... > > > Because what i need is to update it when i move a process from its original > site, is there something like this?? > > Thanks a lot. > > Hugo > > 2011/5/31 Hugo Meyer <meyer.h...@gmail.com> > Hello @ll. > > I'm needing some help to restart the communication with a process that i > restore in a different node. My situation is as follows: > > The process fails and it's restored in another node succesfully from a > previous checkpoint that i sent there. Now, when a process try to send a > message to this restored process it will fail, or at least, it will be locked > in ompi_request_wait_completion. > > So, when this happens i have to send a message to the daemon of the sender > that will have the uri of where the process has been restored and answer to > the proc with this and it will update this info. > > So, i need to know where in the code i can capture this attempt to send and > then send the message to his daemon and where and how i can update this info > to send the message to the right place (Same rank but new uri). > > I have to do it in this way to avoid a collective communication. > > If you give me a hand on this, it will be great. > > Best regards. > > Hugo > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel