Thanks for your replies. >After doing that, the MPI_Init procedure calls grpcomm.modex to distribute the data across all procs in the job. Unfortunately, being a collective, all procs must participate. In your case, you'll have to find a different way to do it. Upon receipt, each proc updates its own modex db to include the new info.
>Look in orte/mca/grpcomm/bad/grpcomm_bad_module.c at the modex function and follow that code thru the grpcomm/base functions to see how the modex info is retrieved, passed, and decoded on the far end. I will take a look to this Ralph and let you know how it goes. But today looking at the code with a partner, he suggested to me to try to capture an error when sending data through the *btl_tcp_endpoint*, more precisely in * mca_btl_tcp_frag_send* and capture there an error when we try to write to the fd of the socket. I've tried this but when a process moves and try to send a message, or someone try to send a message for him, i cannot capture the moment of the failure in the mca_*btl_tcp_frag_send*, but i don't know why, it is supposed to fail when someone try to send, is there any other place where this is capture? If i do in this way, i can reset connections on demand i suppose. What do you think of this? it's a good idea? And after i detect this failure, i will try to update de modex db of that process from here it's ok? Thanks Hugo 2011/6/3 Jeff Squyres <jsquy...@cisco.com> > On Jun 3, 2011, at 10:12 AM, Ralph Castain wrote: > > > When an MPI proc calls MPI_Init, each btl pushes its contact info into > the modex database - one example is the btl.tcp.1.7 info you found there. > That entry is for the TCP btl, which is probably what you are looking for. > There is no way for you to edit that data - each btl encodes it in its own > way and then adds it to the modex. > > More specifically, whatever each entity puts into the modex is a blob that > is only readable by other entities just like itself. For example, what one > TCP BTL puts in the modex can really only be read by another TCP BTL. The > contents of what the TCP BTL puts in there is an opaque binary blob from the > modex's point of view. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >