base

Ralph Castain Thu, 1 May 2014 09:43:56 -0400 (EDT)

The problem we'll have with BTLs in opal is going to revolve around that 
ompi_process_name_t and will occur in a number of places. I've been trying to 
grok George's statement about accessors and can't figure out a clean way to 
make that work IF every RTE gets to define the process name a different way.


For example, suppose I define ompi_process_name_t to be a string. I can hash 
the string down to an opal_identifier_t, but that is a structureless 64-bit 
value - there is no concept of a jobid or vpid in it. So if you now want to 
extract a jobid for that identifier, the only way you can do it is to "up-call" 
back to the RTE to parse it.

This means that every RTE would have to initialize OPAL with a registration of 
its opal_identifier parser function(s), which seems like a really ugly solution.

Maybe it is time to shift the process identifier down to the opal layer? If we 
define opal_identifier_t to include the required jobid/vpid, perhaps adding a 
void* so someone can put whatever they want in it?

Note that I'm not wild about extending the identifier size beyond 64-bits as 
the memory footprint issue is growing in concern, and I still haven't seen any 
real use-case proposed for extending it.


On May 1, 2014, at 3:41 AM, Jeff Squyres (jsquyres) <[email protected]> wrote:

> On Apr 30, 2014, at 10:01 PM, George Bosilca <[email protected]> wrote:
> 
>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t enough 
>> to dig for the info of the peer into the opal_db?
> 
> 
> At the moment, I use the ompi_process_name_t for RML sends/receives in the 
> usnic BTL.  I know this will have to change when the BTLs move down to OPAL 
> (when is that going to happen, BTW?).  So my future use case may be somewhat 
> moot.
> 
> More detail
> ===========
> 
> "Why does the usnic BTL use RML sends/receives?", you ask.
> 
> The reason is rooted in the fact that the usnic BTL uses an unreliable, 
> connectionless transport under the covert.  We had some customers have 
> network misconfigurations that resulted in usnic traffic not flowing properly 
> (e.g., MTU mismatches in the network).  But since we don't have a 
> connection-oriented underlying API that will eventually timeout/fail to 
> connect/etc. when there's a problem with the network configuration, we added 
> a "connection validation" service in the usnic BTL that fires up in a thread 
> in the local rank 0 on each server.  This thread provides service to all the 
> MPI processes on its server.
> 
> In short: the service thread sends UDP pings and ACKs to peer service threads 
> on other servers (upon demand/upon first send between servers) to verify 
> network connectivity.  If the pings eventually fail/timeout (i.e., don't get 
> ACKs back), the service thread does a show_help and kills the job. 
> 
> There's more details, but that's the gist of it.
> 
> This basically gives us the ability to highlight problems in the network and 
> kill the MPI job rather than spin infinitely while trying to deliver MPI/BTL 
> messages to a peer that will never get there.
> 
> Since this is really a server-to-server network connectivity issue (vs. an 
> MPI peer-to-peer connectivity issue), we only need to have one service thread 
> for a whole server.  The other MPI procs on the server use RML to talk to it. 
>  E.g., "Please ping the server where MPI proc X lives," and so on.  This 
> seemed better than having a service thread in each MPI process.
> 
> We've thought a bit about what to do when the BTLs move down to OPAL (since 
> they won't be able to use RML any more), but don't have a final solution 
> yet...  We do still want to be able to utilize this capability even after the 
> BTL move.
> 
> -- 
> Jeff Squyres
> [email protected]
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14673.php

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base

Reply via email to