Great!

Would any particular time be good for you?  Josh won't be there in person, but 
will be in the same timezone as the meeting (US Central).  I'm assuming you'd 
want a start time like 9am, or 10am US Central at the latest.


On Dec 6, 2013, at 4:12 PM, Adrian Reber <adr...@lisas.de> wrote:

> I saw that the meeting would be available via webex and was planning to
> join. So yes, I will be there and it would be great to hear what
> is discussed about the FT design and what needs to be changed.
> 
>               Adrian
> 
> On Fri, Dec 06, 2013 at 02:36:09PM +0000, Jeff Squyres (jsquyres) wrote:
>> Good points.
>> 
>> You know, this checkpoint stuff is all on the agenda to discuss next week at 
>> the OMPI dev meeting in Chicago.  Ralph correctly points out that since the 
>> fundamental design of ORTE has changed (which caused all this FT bit rot), a 
>> bunch of the original FT design isn't right (or necessary) any more, anyway. 
>>  We need to talk through this stuff to figure out where to go.
>> 
>> Adrian: do you want to join us at the meeting via webex?  I think you're in 
>> Germany; we can do this part of the OMPI dev meeting first thing Friday 
>> morning US Central time, which would put it mid/late-afternoon for you.  It 
>> would probably be good for us to be introduced to you, and for you to hear 
>> all the discussion about how we think the FT design will need to be changed, 
>> etc.
>> 
>>    https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
>> 
>> 
>> 
>> On Dec 6, 2013, at 9:30 AM, Josh Hursey <jjhur...@open-mpi.org> wrote:
>> 
>>> Since the blocking semantics are important for correctness of the prior 
>>> code, I would not just replace send_buffer with send_buffer_nb. This makes 
>>> the semantics incorrect, and will make things confusing later when you try 
>>> to sort out prior calls to send_buffer_nb with those that you replaced.
>>> 
>>> As an alternative I would suggest that you "#ifdef 0" out those sections of 
>>> code and leave the send_buffer_nb alternative in a comment. Then leave a 
>>> big TODO comment there for you to go back and fix the semantics - which 
>>> will likely involve just rewriting large sections of that framework. But at 
>>> least you will be able to see what was there before when you try to move it 
>>> to a more nonblocking model.
>>> 
>>> The bkmrk component is subtle, maybe more that it should be. So keeping the 
>>> old blocking interfaces there will probably help quite a bit when you get 
>>> to it later. In that component the blocking calls are critical to 
>>> correctness, so we will need to sort out how to make that more asynchronous 
>>> in our redesign.
>>> 
>>> Other than that modification (#ifdef comments instead of nonblocking 
>>> replacements), I think this patch is fine. As was mentioned previously, we 
>>> will need to go back (after things compile) and figure out a new model for 
>>> this behavior.
>>> 
>>> Thanks!
>>> Josh
>>> 
>>> 
>>> 
>>> On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) 
>>> <jsquy...@cisco.com> wrote:
>>> Err... upon further thought, I might be totally wrong about emulating 
>>> blocking.  There might be (probably are?) rules/assumptions in the ORTE 
>>> layer (of which I am *not* an expert) that disallow you from [emulating] 
>>> blocking.
>>> 
>>> If that's the case, then there's architectural issues with converting from 
>>> blocking to nonblocking on both the sending and the receiving sides that 
>>> might be a bit thorny to sort out.
>>> 
>>> 
>>> 
>>> On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>>> On Nov 25, 2013, at 9:59 AM, Adrian Reber <adr...@lisas.de> wrote:
>>>> 
>>>>> * Send Non-blocking
>>>>> */
>>>>> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
>>>>>                         struct iovec* msg,
>>>>>                         int count,
>>>>>                         orte_rml_tag_t tag,
>>>>> -                          int flags,
>>>>>                         orte_rml_callback_fn_t cbfunc,
>>>>>                         void* cbdata)
>>>>> {
>>>>>   int ret;
>>>>> 
>>>>>   opal_output_verbose(20, rml_ftrm_output_handle,
>>>>> -                        "orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
>>>>> -                        ORTE_NAME_PRINT(peer), count, tag, flags);
>>>>> +                        "orte_rml_ftrm: send_nb(%s, %d, %d )",
>>>>> +                        ORTE_NAME_PRINT(peer), count, tag);
>>>>> 
>>>>>   if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
>>>>> -        if( ORTE_SUCCESS != (ret = 
>>>>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, 
>>>>> cbfunc, cbdata) ) ) {
>>>>> -            return ret;
>>>>> -        }
>>>>> -    }
>>>>> -
>>>>> -    return ORTE_SUCCESS;
>>>>> -}
>>>>> -
>>>>> -/*
>>>>> - * Send Buffer
>>>>> - */
>>>>> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
>>>>> -                              opal_buffer_t* buffer,
>>>>> -                              orte_rml_tag_t tag,
>>>>> -                              int flags)
>>>>> -{
>>>>> -    int ret;
>>>>> -
>>>>> -    opal_output_verbose(20, rml_ftrm_output_handle,
>>>>> -                        "orte_rml_ftrm: send_buffer(%s, %d, %d )",
>>>>> -                        ORTE_NAME_PRINT(peer), tag, flags);
>>>>> -
>>>>> -    if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
>>>>> -        if( ORTE_SUCCESS != (ret = 
>>>>> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
>>>>> +        if( ORTE_SUCCESS != (ret = 
>>>>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, 
>>>>> cbdata) ) ) {
>>>>>           return ret;
>>>>>       }
>>>>>   }
>>>> 
>>>> Similar to my reply about patch 3, I don't think this hunk is correct.
>>>> 
>>>> This routine accepts an iovec and sends it in a non-blocking fashion.  
>>>> I'll bet that the caller frees the iovec upon return from the function 
>>>> (because it used to be a blocking send, and freeing it immediately was 
>>>> acceptable).
>>>> 
>>>> But now the iovec may well still be in use when this function returns, so 
>>>> the caller should *not* free/reuse the iovec until it knows that the send 
>>>> has complete.
>>>> 
>>>> It may be more desirable to keep the blocking send function 
>>>> orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under 
>>>> the covers, but then not returning until the send callback has actually 
>>>> been invoked.
>>>> 
>>>> Then the blocking semantics expected by the caller may well be 
>>>> acceptable/safe.
>>>> 
>>>> This loses some potential optimizations of asynchronicity, but it may be 
>>>> worth it: a) performance in this part of the code isn't too critical, and 
>>>> b) blocking semantics are usually simpler and easier to maintain, from the 
>>>> caller's perspective.
>>>> 
>>>> This idea may also apply to what I said in reply to patch 3...?  (i.e., 
>>>> preserve a blocking send by using the _nb variant under the covers, but 
>>>> not returning until the nonblocking variant has actually completed the 
>>>> receive).
>>>> 
>>>> Since this is a fairly large change, I didn't look too closely throughout 
>>>> the rest of this patch.  I assume that there are a few other architectural 
>>>> cases similar to this one.
>>>> 
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to: 
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> 
>>> -- 
>>> Joshua Hursey
>>> Assistant Professor of Computer Science
>>> University of Wisconsin-La Crosse
>>> http://cs.uwlax.edu/~jjhursey
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>               Adrian
> 
> -- 
> Adrian Reber <adr...@lisas.de>            http://lisas.de/~adrian/
>       The FIELD GUIDE to NORTH AMERICAN MALES
> 
> SPECIES:      Cranial Males
> SUBSPECIES:   The Hacker (homo computatis)
> Plumage:
>       All clothes have a slightly crumpled look as though they came off the
>       top of the laundry basket.  Style varies with status.  Hacker managers
>       wear gray polyester slacks, pink or pastel shirts with wide collars,
>       and paisley ties; staff wears cinched-up baggy corduroy pants, white
>       or blue shirts with button-down collars, and penholder in pocket.
>       Both managers and staff wear running shoes to work, and a black
>       plastic digital watch with calculator.
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to