On Apr 30, 2014, at 10:01 PM, George Bosilca <[email protected]> wrote:

> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough 
> to dig for the info of the peer into the opal_db?


At the moment, I use the ompi_process_name_t for RML sends/receives in the 
usnic BTL.  I know this will have to change when the BTLs move down to OPAL 
(when is that going to happen, BTW?).  So my future use case may be somewhat 
moot.

More detail
===========

"Why does the usnic BTL use RML sends/receives?", you ask.

The reason is rooted in the fact that the usnic BTL uses an unreliable, 
connectionless transport under the covert.  We had some customers have network 
misconfigurations that resulted in usnic traffic not flowing properly (e.g., 
MTU mismatches in the network).  But since we don't have a connection-oriented 
underlying API that will eventually timeout/fail to connect/etc. when there's a 
problem with the network configuration, we added a "connection validation" 
service in the usnic BTL that fires up in a thread in the local rank 0 on each 
server.  This thread provides service to all the MPI processes on its server.

In short: the service thread sends UDP pings and ACKs to peer service threads 
on other servers (upon demand/upon first send between servers) to verify 
network connectivity.  If the pings eventually fail/timeout (i.e., don't get 
ACKs back), the service thread does a show_help and kills the job. 

There's more details, but that's the gist of it.

This basically gives us the ability to highlight problems in the network and 
kill the MPI job rather than spin infinitely while trying to deliver MPI/BTL 
messages to a peer that will never get there.

Since this is really a server-to-server network connectivity issue (vs. an MPI 
peer-to-peer connectivity issue), we only need to have one service thread for a 
whole server.  The other MPI procs on the server use RML to talk to it.  E.g., 
"Please ping the server where MPI proc X lives," and so on.  This seemed better 
than having a service thread in each MPI process.

We've thought a bit about what to do when the BTLs move down to OPAL (since 
they won't be able to use RML any more), but don't have a final solution yet... 
 We do still want to be able to utilize this capability even after the BTL move.

-- 
Jeff Squyres
[email protected]
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to