Too bad all this happened so fast otherwise ORNL would have at least 
participated to the call to understand what is going to happen (since we have a 
RTE module that we maintain). Any chance we could have a summary?

Thanks,


On May 1, 2014, at 2:40 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Just to report back to the list: the three of us discussed this at some 
> length, and decided we like George's proposed solution. Looks like a good 
> clean approach that provides flexibility for the future. So we will introduce 
> it when the BTLs move down to OPAL as (a) George already has it implemented 
> there, and (b) we don't really need it before then.
> 
> Thanks George!
> Ralph
> 
> 
> On May 1, 2014, at 9:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
>> Done!
>> 
>> On May 1, 2014, at 11:22 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>> 
>>> Apparently we are good today at 2PM EST. Fire-up the webex ;)
>>> 
>>> George.
>>> 
>>> On May 1, 2014, at 10:35 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>>> http://doodle.com/hhm4yyr76ipcxgk2
>>>> 
>>>> 
>>>> On May 1, 2014, at 10:25 AM, Ralph Castain <r...@open-mpi.org>
>>>> wrote:
>>>> 
>>>>> sure - might be faster that way :-)
>>>>> 
>>>>> On May 1, 2014, at 6:59 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>>>> wrote:
>>>>> 
>>>>>> Want to have a phone call/webex to discuss?
>>>>>> 
>>>>>> 
>>>>>> On May 1, 2014, at 9:43 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> 
>>>>>>> The problem we'll have with BTLs in opal is going to revolve around 
>>>>>>> that ompi_process_name_t and will occur in a number of places. I've 
>>>>>>> been trying to grok George's statement about accessors and can't figure 
>>>>>>> out a clean way to make that work IF every RTE gets to define the 
>>>>>>> process name a different way.
>>>>>>> 
>>>>>>> For example, suppose I define ompi_process_name_t to be a string. I can 
>>>>>>> hash the string down to an opal_identifier_t, but that is a 
>>>>>>> structureless 64-bit value - there is no concept of a jobid or vpid in 
>>>>>>> it. So if you now want to extract a jobid for that identifier, the only 
>>>>>>> way you can do it is to "up-call" back to the RTE to parse it.
>>>>>>> 
>>>>>>> This means that every RTE would have to initialize OPAL with a 
>>>>>>> registration of its opal_identifier parser function(s), which seems 
>>>>>>> like a really ugly solution.
>>>>>>> 
>>>>>>> Maybe it is time to shift the process identifier down to the opal 
>>>>>>> layer? If we define opal_identifier_t to include the required 
>>>>>>> jobid/vpid, perhaps adding a void* so someone can put whatever they 
>>>>>>> want in it?
>>>>>>> 
>>>>>>> Note that I'm not wild about extending the identifier size beyond 
>>>>>>> 64-bits as the memory footprint issue is growing in concern, and I 
>>>>>>> still haven't seen any real use-case proposed for extending it.
>>>>>>> 
>>>>>>> 
>>>>>>> On May 1, 2014, at 3:41 AM, Jeff Squyres (jsquyres) 
>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>> 
>>>>>>>> On Apr 30, 2014, at 10:01 PM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t 
>>>>>>>>> enough to dig for the info of the peer into the opal_db?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> At the moment, I use the ompi_process_name_t for RML sends/receives in 
>>>>>>>> the usnic BTL.  I know this will have to change when the BTLs move 
>>>>>>>> down to OPAL (when is that going to happen, BTW?).  So my future use 
>>>>>>>> case may be somewhat moot.
>>>>>>>> 
>>>>>>>> More detail
>>>>>>>> ===========
>>>>>>>> 
>>>>>>>> "Why does the usnic BTL use RML sends/receives?", you ask.
>>>>>>>> 
>>>>>>>> The reason is rooted in the fact that the usnic BTL uses an 
>>>>>>>> unreliable, connectionless transport under the covert.  We had some 
>>>>>>>> customers have network misconfigurations that resulted in usnic 
>>>>>>>> traffic not flowing properly (e.g., MTU mismatches in the network).  
>>>>>>>> But since we don't have a connection-oriented underlying API that will 
>>>>>>>> eventually timeout/fail to connect/etc. when there's a problem with 
>>>>>>>> the network configuration, we added a "connection validation" service 
>>>>>>>> in the usnic BTL that fires up in a thread in the local rank 0 on each 
>>>>>>>> server.  This thread provides service to all the MPI processes on its 
>>>>>>>> server.
>>>>>>>> 
>>>>>>>> In short: the service thread sends UDP pings and ACKs to peer service 
>>>>>>>> threads on other servers (upon demand/upon first send between servers) 
>>>>>>>> to verify network connectivity.  If the pings eventually fail/timeout 
>>>>>>>> (i.e., don't get ACKs back), the service thread does a show_help and 
>>>>>>>> kills the job.
>>>>>>>> 
>>>>>>>> There's more details, but that's the gist of it.
>>>>>>>> 
>>>>>>>> This basically gives us the ability to highlight problems in the 
>>>>>>>> network and kill the MPI job rather than spin infinitely while trying 
>>>>>>>> to deliver MPI/BTL messages to a peer that will never get there.
>>>>>>>> 
>>>>>>>> Since this is really a server-to-server network connectivity issue 
>>>>>>>> (vs. an MPI peer-to-peer connectivity issue), we only need to have one 
>>>>>>>> service thread for a whole server.  The other MPI procs on the server 
>>>>>>>> use RML to talk to it.  E.g., "Please ping the server where MPI proc X 
>>>>>>>> lives," and so on.  This seemed better than having a service thread in 
>>>>>>>> each MPI process.
>>>>>>>> 
>>>>>>>> We've thought a bit about what to do when the BTLs move down to OPAL 
>>>>>>>> (since they won't be able to use RML any more), but don't have a final 
>>>>>>>> solution yet...  We do still want to be able to utilize this 
>>>>>>>> capability even after the BTL move.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> jsquy...@cisco.com
>>>>>>>> For corporate legal information go to: 
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14673.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14674.php
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> For corporate legal information go to: 
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14675.php
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14676.php
>>>> 
>>>> 
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14677.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/05/14678.php
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14680.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14681.php

Reply via email to