Are you talking about an MPI communication? If so, then you need to update 
every proc's modex info for the proc that moved  - this is something stored in 
each MPI proc's memory, so it isn't something that you can just get from the 
daemon on-demand. You'll have to provide the update to every single proc 
directly so that it has the info if/when it should decide to send an MPI 
message to the proc that moved.

This is why we do a modex upon restart - sending the change to every MPI proc 
is hardly scalable minus a collective operation.

See the modex database interface in orte/mca/grpcomm/base/grpcomm_base_modex.c. 
You'll have to create new code to send/recv an update message, but the code to 
update the database entry exists.


On Jun 2, 2011, at 7:52 AM, Hugo Meyer wrote:

> Hello again.
> 
> My actual problem is that i don't know where is the struct that has the 
> information that is used to send messages to the procs.
> 
> Something like:
> 
> Rank       URI
> 0             21222:tcp:192.168.1.1:1250
> 1             21223:tcp:192.168.1.2:1250
> .....          .....
> 
> 
> Because what i need is to update it when i move a process from its original 
> site, is there something like this??
> 
> Thanks a lot.
> 
> Hugo 
> 
> 2011/5/31 Hugo Meyer <meyer.h...@gmail.com>
> Hello @ll.
> 
> I'm needing some help to restart the communication with a process that i 
> restore in a different node. My situation is as follows:
> 
> The process fails and it's restored in another node succesfully from a 
> previous checkpoint that i sent there. Now, when a process try to send a 
> message to this restored process it will fail, or at least, it will be locked 
> in ompi_request_wait_completion. 
> 
> So, when this happens i have to send a message to the daemon of the sender 
> that will have the uri of where the process has been restored and answer to 
> the proc with this and it will update this info.
> 
> So, i need to know where in the code i can capture this attempt to send and 
> then send the message to his daemon and where and how i can update this info 
> to send the message to the right place (Same rank but new uri).
> 
> I have to do it in this way to avoid a collective communication.
> 
> If you give me a hand on this, it will be great.
> 
> Best regards.
> 
> Hugo
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to