That's a great idea Jeff. I did not know it had made it on the agenda for that meeting. I would like to attend, and my Friday morning is pretty open at the moment. With timezones, would a doodle poll be useful here or do you think we can sort it out via email?
Thanks. Josh On Fri, Dec 6, 2013 at 8:36 AM, Jeff Squyres (jsquyres) <[email protected]>wrote: > Good points. > > You know, this checkpoint stuff is all on the agenda to discuss next week > at the OMPI dev meeting in Chicago. Ralph correctly points out that since > the fundamental design of ORTE has changed (which caused all this FT bit > rot), a bunch of the original FT design isn't right (or necessary) any > more, anyway. We need to talk through this stuff to figure out where to go. > > Adrian: do you want to join us at the meeting via webex? I think you're > in Germany; we can do this part of the OMPI dev meeting first thing Friday > morning US Central time, which would put it mid/late-afternoon for you. It > would probably be good for us to be introduced to you, and for you to hear > all the discussion about how we think the FT design will need to be > changed, etc. > > https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting > > > > On Dec 6, 2013, at 9:30 AM, Josh Hursey <[email protected]> wrote: > > > Since the blocking semantics are important for correctness of the prior > code, I would not just replace send_buffer with send_buffer_nb. This makes > the semantics incorrect, and will make things confusing later when you try > to sort out prior calls to send_buffer_nb with those that you replaced. > > > > As an alternative I would suggest that you "#ifdef 0" out those sections > of code and leave the send_buffer_nb alternative in a comment. Then leave a > big TODO comment there for you to go back and fix the semantics - which > will likely involve just rewriting large sections of that framework. But at > least you will be able to see what was there before when you try to move it > to a more nonblocking model. > > > > The bkmrk component is subtle, maybe more that it should be. So keeping > the old blocking interfaces there will probably help quite a bit when you > get to it later. In that component the blocking calls are critical to > correctness, so we will need to sort out how to make that more asynchronous > in our redesign. > > > > Other than that modification (#ifdef comments instead of nonblocking > replacements), I think this patch is fine. As was mentioned previously, we > will need to go back (after things compile) and figure out a new model for > this behavior. > > > > Thanks! > > Josh > > > > > > > > On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) < > [email protected]> wrote: > > Err... upon further thought, I might be totally wrong about emulating > blocking. There might be (probably are?) rules/assumptions in the ORTE > layer (of which I am *not* an expert) that disallow you from [emulating] > blocking. > > > > If that's the case, then there's architectural issues with converting > from blocking to nonblocking on both the sending and the receiving sides > that might be a bit thorny to sort out. > > > > > > > > On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" < > [email protected]> wrote: > > > > > On Nov 25, 2013, at 9:59 AM, Adrian Reber <[email protected]> wrote: > > > > > >> * Send Non-blocking > > >> */ > > >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer, > > >> struct iovec* msg, > > >> int count, > > >> orte_rml_tag_t tag, > > >> - int flags, > > >> orte_rml_callback_fn_t cbfunc, > > >> void* cbdata) > > >> { > > >> int ret; > > >> > > >> opal_output_verbose(20, rml_ftrm_output_handle, > > >> - "orte_rml_ftrm: send_nb(%s, %d, %d, %d )", > > >> - ORTE_NAME_PRINT(peer), count, tag, flags); > > >> + "orte_rml_ftrm: send_nb(%s, %d, %d )", > > >> + ORTE_NAME_PRINT(peer), count, tag); > > >> > > >> if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) { > > >> - if( ORTE_SUCCESS != (ret = > orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc, > cbdata) ) ) { > > >> - return ret; > > >> - } > > >> - } > > >> - > > >> - return ORTE_SUCCESS; > > >> -} > > >> - > > >> -/* > > >> - * Send Buffer > > >> - */ > > >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer, > > >> - opal_buffer_t* buffer, > > >> - orte_rml_tag_t tag, > > >> - int flags) > > >> -{ > > >> - int ret; > > >> - > > >> - opal_output_verbose(20, rml_ftrm_output_handle, > > >> - "orte_rml_ftrm: send_buffer(%s, %d, %d )", > > >> - ORTE_NAME_PRINT(peer), tag, flags); > > >> - > > >> - if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) { > > >> - if( ORTE_SUCCESS != (ret = > orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) { > > >> + if( ORTE_SUCCESS != (ret = > orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, cbdata) > ) ) { > > >> return ret; > > >> } > > >> } > > > > > > Similar to my reply about patch 3, I don't think this hunk is correct. > > > > > > This routine accepts an iovec and sends it in a non-blocking fashion. > I'll bet that the caller frees the iovec upon return from the function > (because it used to be a blocking send, and freeing it immediately was > acceptable). > > > > > > But now the iovec may well still be in use when this function returns, > so the caller should *not* free/reuse the iovec until it knows that the > send has complete. > > > > > > It may be more desirable to keep the blocking send function > orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under > the covers, but then not returning until the send callback has actually > been invoked. > > > > > > Then the blocking semantics expected by the caller may well be > acceptable/safe. > > > > > > This loses some potential optimizations of asynchronicity, but it may > be worth it: a) performance in this part of the code isn't too critical, > and b) blocking semantics are usually simpler and easier to maintain, from > the caller's perspective. > > > > > > This idea may also apply to what I said in reply to patch 3...? > (i.e., preserve a blocking send by using the _nb variant under the covers, > but not returning until the nonblocking variant has actually completed the > receive). > > > > > > Since this is a fairly large change, I didn't look too closely > throughout the rest of this patch. I assume that there are a few other > architectural cases similar to this one. > > > > > > -- > > > Jeff Squyres > > > [email protected] > > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > > devel mailing list > > > [email protected] > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > -- > > Jeff Squyres > > [email protected] > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > > devel mailing list > > [email protected] > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > > > -- > > Joshua Hursey > > Assistant Professor of Computer Science > > University of Wisconsin-La Crosse > > http://cs.uwlax.edu/~jjhursey > > _______________________________________________ > > devel mailing list > > [email protected] > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > [email protected] > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
