Too bad all this happened so fast otherwise ORNL would have at least participated to the call to understand what is going to happen (since we have a RTE module that we maintain). Any chance we could have a summary?
Thanks, On May 1, 2014, at 2:40 PM, Ralph Castain <r...@open-mpi.org> wrote: > Just to report back to the list: the three of us discussed this at some > length, and decided we like George's proposed solution. Looks like a good > clean approach that provides flexibility for the future. So we will introduce > it when the BTLs move down to OPAL as (a) George already has it implemented > there, and (b) we don't really need it before then. > > Thanks George! > Ralph > > > On May 1, 2014, at 9:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > >> Done! >> >> On May 1, 2014, at 11:22 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> Apparently we are good today at 2PM EST. Fire-up the webex ;) >>> >>> George. >>> >>> On May 1, 2014, at 10:35 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>> wrote: >>> >>>> http://doodle.com/hhm4yyr76ipcxgk2 >>>> >>>> >>>> On May 1, 2014, at 10:25 AM, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >>>>> sure - might be faster that way :-) >>>>> >>>>> On May 1, 2014, at 6:59 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>>>> wrote: >>>>> >>>>>> Want to have a phone call/webex to discuss? >>>>>> >>>>>> >>>>>> On May 1, 2014, at 9:43 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> >>>>>>> The problem we'll have with BTLs in opal is going to revolve around >>>>>>> that ompi_process_name_t and will occur in a number of places. I've >>>>>>> been trying to grok George's statement about accessors and can't figure >>>>>>> out a clean way to make that work IF every RTE gets to define the >>>>>>> process name a different way. >>>>>>> >>>>>>> For example, suppose I define ompi_process_name_t to be a string. I can >>>>>>> hash the string down to an opal_identifier_t, but that is a >>>>>>> structureless 64-bit value - there is no concept of a jobid or vpid in >>>>>>> it. So if you now want to extract a jobid for that identifier, the only >>>>>>> way you can do it is to "up-call" back to the RTE to parse it. >>>>>>> >>>>>>> This means that every RTE would have to initialize OPAL with a >>>>>>> registration of its opal_identifier parser function(s), which seems >>>>>>> like a really ugly solution. >>>>>>> >>>>>>> Maybe it is time to shift the process identifier down to the opal >>>>>>> layer? If we define opal_identifier_t to include the required >>>>>>> jobid/vpid, perhaps adding a void* so someone can put whatever they >>>>>>> want in it? >>>>>>> >>>>>>> Note that I'm not wild about extending the identifier size beyond >>>>>>> 64-bits as the memory footprint issue is growing in concern, and I >>>>>>> still haven't seen any real use-case proposed for extending it. >>>>>>> >>>>>>> >>>>>>> On May 1, 2014, at 3:41 AM, Jeff Squyres (jsquyres) >>>>>>> <jsquy...@cisco.com> wrote: >>>>>>> >>>>>>>> On Apr 30, 2014, at 10:01 PM, George Bosilca <bosi...@icl.utk.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t >>>>>>>>> enough to dig for the info of the peer into the opal_db? >>>>>>>> >>>>>>>> >>>>>>>> At the moment, I use the ompi_process_name_t for RML sends/receives in >>>>>>>> the usnic BTL. I know this will have to change when the BTLs move >>>>>>>> down to OPAL (when is that going to happen, BTW?). So my future use >>>>>>>> case may be somewhat moot. >>>>>>>> >>>>>>>> More detail >>>>>>>> =========== >>>>>>>> >>>>>>>> "Why does the usnic BTL use RML sends/receives?", you ask. >>>>>>>> >>>>>>>> The reason is rooted in the fact that the usnic BTL uses an >>>>>>>> unreliable, connectionless transport under the covert. We had some >>>>>>>> customers have network misconfigurations that resulted in usnic >>>>>>>> traffic not flowing properly (e.g., MTU mismatches in the network). >>>>>>>> But since we don't have a connection-oriented underlying API that will >>>>>>>> eventually timeout/fail to connect/etc. when there's a problem with >>>>>>>> the network configuration, we added a "connection validation" service >>>>>>>> in the usnic BTL that fires up in a thread in the local rank 0 on each >>>>>>>> server. This thread provides service to all the MPI processes on its >>>>>>>> server. >>>>>>>> >>>>>>>> In short: the service thread sends UDP pings and ACKs to peer service >>>>>>>> threads on other servers (upon demand/upon first send between servers) >>>>>>>> to verify network connectivity. If the pings eventually fail/timeout >>>>>>>> (i.e., don't get ACKs back), the service thread does a show_help and >>>>>>>> kills the job. >>>>>>>> >>>>>>>> There's more details, but that's the gist of it. >>>>>>>> >>>>>>>> This basically gives us the ability to highlight problems in the >>>>>>>> network and kill the MPI job rather than spin infinitely while trying >>>>>>>> to deliver MPI/BTL messages to a peer that will never get there. >>>>>>>> >>>>>>>> Since this is really a server-to-server network connectivity issue >>>>>>>> (vs. an MPI peer-to-peer connectivity issue), we only need to have one >>>>>>>> service thread for a whole server. The other MPI procs on the server >>>>>>>> use RML to talk to it. E.g., "Please ping the server where MPI proc X >>>>>>>> lives," and so on. This seemed better than having a service thread in >>>>>>>> each MPI process. >>>>>>>> >>>>>>>> We've thought a bit about what to do when the BTLs move down to OPAL >>>>>>>> (since they won't be able to use RML any more), but don't have a final >>>>>>>> solution yet... We do still want to be able to utilize this >>>>>>>> capability even after the BTL move. >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Squyres >>>>>>>> jsquy...@cisco.com >>>>>>>> For corporate legal information go to: >>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14673.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14674.php >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14675.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14676.php >>>> >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/05/14677.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/05/14678.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14680.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14681.php