Too bad all this happened so fast otherwise ORNL would have at least 
participated to the call to understand what is going to happen (since we have a 
RTE module that we maintain). Any chance we could have a summary?

Thanks,


On May 1, 2014, at 2:40 PM, Ralph Castain <[email protected]> wrote:

> Just to report back to the list: the three of us discussed this at some 
> length, and decided we like George's proposed solution. Looks like a good 
> clean approach that provides flexibility for the future. So we will introduce 
> it when the BTLs move down to OPAL as (a) George already has it implemented 
> there, and (b) we don't really need it before then.
> 
> Thanks George!
> Ralph
> 
> 
> On May 1, 2014, at 9:40 AM, Jeff Squyres (jsquyres) <[email protected]> 
> wrote:
> 
>> Done!
>> 
>> On May 1, 2014, at 11:22 AM, George Bosilca <[email protected]> wrote:
>> 
>>> Apparently we are good today at 2PM EST. Fire-up the webex ;)
>>> 
>>> George.
>>> 
>>> On May 1, 2014, at 10:35 , Jeff Squyres (jsquyres) <[email protected]> 
>>> wrote:
>>> 
>>>> http://doodle.com/hhm4yyr76ipcxgk2
>>>> 
>>>> 
>>>> On May 1, 2014, at 10:25 AM, Ralph Castain <[email protected]>
>>>> wrote:
>>>> 
>>>>> sure - might be faster that way :-)
>>>>> 
>>>>> On May 1, 2014, at 6:59 AM, Jeff Squyres (jsquyres) <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> Want to have a phone call/webex to discuss?
>>>>>> 
>>>>>> 
>>>>>> On May 1, 2014, at 9:43 AM, Ralph Castain <[email protected]> wrote:
>>>>>> 
>>>>>>> The problem we'll have with BTLs in opal is going to revolve around 
>>>>>>> that ompi_process_name_t and will occur in a number of places. I've 
>>>>>>> been trying to grok George's statement about accessors and can't figure 
>>>>>>> out a clean way to make that work IF every RTE gets to define the 
>>>>>>> process name a different way.
>>>>>>> 
>>>>>>> For example, suppose I define ompi_process_name_t to be a string. I can 
>>>>>>> hash the string down to an opal_identifier_t, but that is a 
>>>>>>> structureless 64-bit value - there is no concept of a jobid or vpid in 
>>>>>>> it. So if you now want to extract a jobid for that identifier, the only 
>>>>>>> way you can do it is to "up-call" back to the RTE to parse it.
>>>>>>> 
>>>>>>> This means that every RTE would have to initialize OPAL with a 
>>>>>>> registration of its opal_identifier parser function(s), which seems 
>>>>>>> like a really ugly solution.
>>>>>>> 
>>>>>>> Maybe it is time to shift the process identifier down to the opal 
>>>>>>> layer? If we define opal_identifier_t to include the required 
>>>>>>> jobid/vpid, perhaps adding a void* so someone can put whatever they 
>>>>>>> want in it?
>>>>>>> 
>>>>>>> Note that I'm not wild about extending the identifier size beyond 
>>>>>>> 64-bits as the memory footprint issue is growing in concern, and I 
>>>>>>> still haven't seen any real use-case proposed for extending it.
>>>>>>> 
>>>>>>> 
>>>>>>> On May 1, 2014, at 3:41 AM, Jeff Squyres (jsquyres) 
>>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>>> On Apr 30, 2014, at 10:01 PM, George Bosilca <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Why do you need the ompi_process_name_t? Isn’t the opal_identifier_t 
>>>>>>>>> enough to dig for the info of the peer into the opal_db?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> At the moment, I use the ompi_process_name_t for RML sends/receives in 
>>>>>>>> the usnic BTL.  I know this will have to change when the BTLs move 
>>>>>>>> down to OPAL (when is that going to happen, BTW?).  So my future use 
>>>>>>>> case may be somewhat moot.
>>>>>>>> 
>>>>>>>> More detail
>>>>>>>> ===========
>>>>>>>> 
>>>>>>>> "Why does the usnic BTL use RML sends/receives?", you ask.
>>>>>>>> 
>>>>>>>> The reason is rooted in the fact that the usnic BTL uses an 
>>>>>>>> unreliable, connectionless transport under the covert.  We had some 
>>>>>>>> customers have network misconfigurations that resulted in usnic 
>>>>>>>> traffic not flowing properly (e.g., MTU mismatches in the network).  
>>>>>>>> But since we don't have a connection-oriented underlying API that will 
>>>>>>>> eventually timeout/fail to connect/etc. when there's a problem with 
>>>>>>>> the network configuration, we added a "connection validation" service 
>>>>>>>> in the usnic BTL that fires up in a thread in the local rank 0 on each 
>>>>>>>> server.  This thread provides service to all the MPI processes on its 
>>>>>>>> server.
>>>>>>>> 
>>>>>>>> In short: the service thread sends UDP pings and ACKs to peer service 
>>>>>>>> threads on other servers (upon demand/upon first send between servers) 
>>>>>>>> to verify network connectivity.  If the pings eventually fail/timeout 
>>>>>>>> (i.e., don't get ACKs back), the service thread does a show_help and 
>>>>>>>> kills the job.
>>>>>>>> 
>>>>>>>> There's more details, but that's the gist of it.
>>>>>>>> 
>>>>>>>> This basically gives us the ability to highlight problems in the 
>>>>>>>> network and kill the MPI job rather than spin infinitely while trying 
>>>>>>>> to deliver MPI/BTL messages to a peer that will never get there.
>>>>>>>> 
>>>>>>>> Since this is really a server-to-server network connectivity issue 
>>>>>>>> (vs. an MPI peer-to-peer connectivity issue), we only need to have one 
>>>>>>>> service thread for a whole server.  The other MPI procs on the server 
>>>>>>>> use RML to talk to it.  E.g., "Please ping the server where MPI proc X 
>>>>>>>> lives," and so on.  This seemed better than having a service thread in 
>>>>>>>> each MPI process.
>>>>>>>> 
>>>>>>>> We've thought a bit about what to do when the BTLs move down to OPAL 
>>>>>>>> (since they won't be able to use RML any more), but don't have a final 
>>>>>>>> solution yet...  We do still want to be able to utilize this 
>>>>>>>> capability even after the BTL move.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> [email protected]
>>>>>>>> For corporate legal information go to: 
>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> [email protected]
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14673.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> [email protected]
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14674.php
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> [email protected]
>>>>>> For corporate legal information go to: 
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14675.php
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14676.php
>>>> 
>>>> 
>>>> --
>>>> Jeff Squyres
>>>> [email protected]
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/05/14677.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/05/14678.php
>> 
>> 
>> --
>> Jeff Squyres
>> [email protected]
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14680.php
> 
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14681.php

Reply via email to