Great! Would any particular time be good for you? Josh won't be there in person, but will be in the same timezone as the meeting (US Central). I'm assuming you'd want a start time like 9am, or 10am US Central at the latest.
On Dec 6, 2013, at 4:12 PM, Adrian Reber <adr...@lisas.de> wrote: > I saw that the meeting would be available via webex and was planning to > join. So yes, I will be there and it would be great to hear what > is discussed about the FT design and what needs to be changed. > > Adrian > > On Fri, Dec 06, 2013 at 02:36:09PM +0000, Jeff Squyres (jsquyres) wrote: >> Good points. >> >> You know, this checkpoint stuff is all on the agenda to discuss next week at >> the OMPI dev meeting in Chicago. Ralph correctly points out that since the >> fundamental design of ORTE has changed (which caused all this FT bit rot), a >> bunch of the original FT design isn't right (or necessary) any more, anyway. >> We need to talk through this stuff to figure out where to go. >> >> Adrian: do you want to join us at the meeting via webex? I think you're in >> Germany; we can do this part of the OMPI dev meeting first thing Friday >> morning US Central time, which would put it mid/late-afternoon for you. It >> would probably be good for us to be introduced to you, and for you to hear >> all the discussion about how we think the FT design will need to be changed, >> etc. >> >> https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting >> >> >> >> On Dec 6, 2013, at 9:30 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: >> >>> Since the blocking semantics are important for correctness of the prior >>> code, I would not just replace send_buffer with send_buffer_nb. This makes >>> the semantics incorrect, and will make things confusing later when you try >>> to sort out prior calls to send_buffer_nb with those that you replaced. >>> >>> As an alternative I would suggest that you "#ifdef 0" out those sections of >>> code and leave the send_buffer_nb alternative in a comment. Then leave a >>> big TODO comment there for you to go back and fix the semantics - which >>> will likely involve just rewriting large sections of that framework. But at >>> least you will be able to see what was there before when you try to move it >>> to a more nonblocking model. >>> >>> The bkmrk component is subtle, maybe more that it should be. So keeping the >>> old blocking interfaces there will probably help quite a bit when you get >>> to it later. In that component the blocking calls are critical to >>> correctness, so we will need to sort out how to make that more asynchronous >>> in our redesign. >>> >>> Other than that modification (#ifdef comments instead of nonblocking >>> replacements), I think this patch is fine. As was mentioned previously, we >>> will need to go back (after things compile) and figure out a new model for >>> this behavior. >>> >>> Thanks! >>> Josh >>> >>> >>> >>> On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) >>> <jsquy...@cisco.com> wrote: >>> Err... upon further thought, I might be totally wrong about emulating >>> blocking. There might be (probably are?) rules/assumptions in the ORTE >>> layer (of which I am *not* an expert) that disallow you from [emulating] >>> blocking. >>> >>> If that's the case, then there's architectural issues with converting from >>> blocking to nonblocking on both the sending and the receiving sides that >>> might be a bit thorny to sort out. >>> >>> >>> >>> On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> >>> wrote: >>> >>>> On Nov 25, 2013, at 9:59 AM, Adrian Reber <adr...@lisas.de> wrote: >>>> >>>>> * Send Non-blocking >>>>> */ >>>>> int orte_rml_ftrm_send_nb(orte_process_name_t* peer, >>>>> struct iovec* msg, >>>>> int count, >>>>> orte_rml_tag_t tag, >>>>> - int flags, >>>>> orte_rml_callback_fn_t cbfunc, >>>>> void* cbdata) >>>>> { >>>>> int ret; >>>>> >>>>> opal_output_verbose(20, rml_ftrm_output_handle, >>>>> - "orte_rml_ftrm: send_nb(%s, %d, %d, %d )", >>>>> - ORTE_NAME_PRINT(peer), count, tag, flags); >>>>> + "orte_rml_ftrm: send_nb(%s, %d, %d )", >>>>> + ORTE_NAME_PRINT(peer), count, tag); >>>>> >>>>> if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) { >>>>> - if( ORTE_SUCCESS != (ret = >>>>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, >>>>> cbfunc, cbdata) ) ) { >>>>> - return ret; >>>>> - } >>>>> - } >>>>> - >>>>> - return ORTE_SUCCESS; >>>>> -} >>>>> - >>>>> -/* >>>>> - * Send Buffer >>>>> - */ >>>>> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer, >>>>> - opal_buffer_t* buffer, >>>>> - orte_rml_tag_t tag, >>>>> - int flags) >>>>> -{ >>>>> - int ret; >>>>> - >>>>> - opal_output_verbose(20, rml_ftrm_output_handle, >>>>> - "orte_rml_ftrm: send_buffer(%s, %d, %d )", >>>>> - ORTE_NAME_PRINT(peer), tag, flags); >>>>> - >>>>> - if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) { >>>>> - if( ORTE_SUCCESS != (ret = >>>>> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) { >>>>> + if( ORTE_SUCCESS != (ret = >>>>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, >>>>> cbdata) ) ) { >>>>> return ret; >>>>> } >>>>> } >>>> >>>> Similar to my reply about patch 3, I don't think this hunk is correct. >>>> >>>> This routine accepts an iovec and sends it in a non-blocking fashion. >>>> I'll bet that the caller frees the iovec upon return from the function >>>> (because it used to be a blocking send, and freeing it immediately was >>>> acceptable). >>>> >>>> But now the iovec may well still be in use when this function returns, so >>>> the caller should *not* free/reuse the iovec until it knows that the send >>>> has complete. >>>> >>>> It may be more desirable to keep the blocking send function >>>> orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under >>>> the covers, but then not returning until the send callback has actually >>>> been invoked. >>>> >>>> Then the blocking semantics expected by the caller may well be >>>> acceptable/safe. >>>> >>>> This loses some potential optimizations of asynchronicity, but it may be >>>> worth it: a) performance in this part of the code isn't too critical, and >>>> b) blocking semantics are usually simpler and easier to maintain, from the >>>> caller's perspective. >>>> >>>> This idea may also apply to what I said in reply to patch 3...? (i.e., >>>> preserve a blocking send by using the _nb variant under the covers, but >>>> not returning until the nonblocking variant has actually completed the >>>> receive). >>>> >>>> Since this is a fairly large change, I didn't look too closely throughout >>>> the rest of this patch. I assume that there are a few other architectural >>>> cases similar to this one. >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> -- >>> Joshua Hursey >>> Assistant Professor of Computer Science >>> University of Wisconsin-La Crosse >>> http://cs.uwlax.edu/~jjhursey >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Adrian > > -- > Adrian Reber <adr...@lisas.de> http://lisas.de/~adrian/ > The FIELD GUIDE to NORTH AMERICAN MALES > > SPECIES: Cranial Males > SUBSPECIES: The Hacker (homo computatis) > Plumage: > All clothes have a slightly crumpled look as though they came off the > top of the laundry basket. Style varies with status. Hacker managers > wear gray polyester slacks, pink or pastel shirts with wide collars, > and paisley ties; staff wears cinched-up baggy corduroy pants, white > or blue shirts with button-down collars, and penholder in pocket. > Both managers and staff wear running shoes to work, and a black > plastic digital watch with calculator. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/