Thanks for your replies.

>After doing that, the MPI_Init procedure calls grpcomm.modex to distribute
the data across all procs in the job. Unfortunately, being a collective, all
procs must participate. In your case, you'll have to find a different way to
do it. Upon receipt, each proc updates its own modex db to include the new
info.

>Look in orte/mca/grpcomm/bad/grpcomm_bad_module.c at the modex function and
follow that code thru the grpcomm/base functions to see how the modex info
is retrieved, passed, and decoded on the far end.

I will take a look to this Ralph and let you know how it goes. But today
looking at the code with a partner, he suggested to me to try to capture an
error when sending data through the *btl_tcp_endpoint*, more precisely in *
mca_btl_tcp_frag_send* and capture there an error when we try to write to
the fd of the socket. I've tried this but when a process moves and try to
send a message, or someone try to send a message for him, i cannot capture
the moment of the failure in the mca_*btl_tcp_frag_send*, but i don't know
why, it is supposed to fail when someone try to send, is there any other
place where this is capture? If i do in this way, i can reset connections on
demand i suppose. What do you think of this? it's a good idea? And after i
detect this failure, i will try to update de modex db of that process from
here it's ok?

Thanks

Hugo



2011/6/3 Jeff Squyres <jsquy...@cisco.com>

> On Jun 3, 2011, at 10:12 AM, Ralph Castain wrote:
>
> > When an MPI proc calls MPI_Init, each btl pushes its contact info into
> the modex database - one example is the btl.tcp.1.7 info you found there.
> That entry is for the TCP btl, which is probably what you are looking for.
> There is no way for you to edit that data - each btl encodes it in its own
> way and then adds it to the modex.
>
> More specifically, whatever each entity puts into the modex is a blob that
> is only readable by other entities just like itself.  For example, what one
> TCP BTL puts in the modex can really only be read by another TCP BTL. The
> contents of what the TCP BTL puts in there is an opaque binary blob from the
> modex's point of view.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to