Hello again.

My actual problem is that i don't know where is the struct that has the
information that is used to send messages to the procs.

Something like:

Rank       URI
0             21222:tcp:192.168.1.1:1250
1             21223:tcp:192.168.1.2:1250
.....          .....


Because what i need is to update it when i move a process from its original
site, is there something like this??

Thanks a lot.

Hugo

2011/5/31 Hugo Meyer <meyer.h...@gmail.com>

> Hello @ll.
>
> I'm needing some help to restart the communication with a process that i
> restore in a different node. My situation is as follows:
>
> The process fails and it's restored in another node succesfully from a
> previous checkpoint that i sent there. Now, when a process try to send a
> message to this restored process it will fail, or at least, it will be
> locked in *ompi_request_wait_completion. *
> *
> *
> So, when this happens i have to send a message to the daemon of the sender
> that will have the uri of where the process has been restored and answer to
> the proc with this and it will update this info.
>
> So, i need to know where in the code i can capture this attempt to send and
> then send the message to his daemon and where and how i can update this info
> to send the message to the right place (Same rank but new uri).
>
> I have to do it in this way to avoid a collective communication.
>
> If you give me a hand on this, it will be great.
>
> Best regards.
>
> Hugo
>

Reply via email to