Josh explained it to me a few days ago, that after a checkpoint has been received TCP should no longer be used to not lose any messages. The communication happens over named pipes and therefore (I think) OOB ft_event() is used to quite anything besides the pipes. This all seems to work but I was just confused as the functions for ft_event() in oob/tcp and oob/ud do not seem to contain any functionality.
So do I try to fix the ft_event() function in oob/base/ to call the registered ft_event() function which does nothing or do I just remove the call to orte oob ft_event(). On Thu, Feb 06, 2014 at 10:49:25AM -0800, Ralph Castain wrote: > The only reason I can think of for an OOB ft-event would be to tell the OOB > to stop sending any messages. You would need to push that into the event > library and use a callback event to let you know when it was done. > > Of course, once you did that, the OOB would no longer be available to, for > example, tell the local daemon that the app is ready for checkpoint :-) > > Afraid I'll have to defer to Josh H for any further guidance. > > > On Feb 6, 2014, at 8:15 AM, Adrian Reber <adr...@lisas.de> wrote: > > > When I initially made the C/R code compile again I made following > > change: > > > > diff --git a/orte/mca/rml/oob/rml_oob_component.c > > b/orte/mca/rml/oob/rml_oob_component.c > > index f0b22fc..90ed086 100644 > > --- a/orte/mca/rml/oob/rml_oob_component.c > > +++ b/orte/mca/rml/oob/rml_oob_component.c > > @@ -185,8 +185,7 @@ orte_rml_oob_ft_event(int state) { > > ; > > } > > > > - if( ORTE_SUCCESS != > > - (ret = orte_oob.ft_event(state)) ) { > > + if( ORTE_SUCCESS != (ret = orte_rml_oob_ft_event(state)) ) { > > ORTE_ERROR_LOG(ret); > > exit_status = ret; > > goto cleanup; > > > > > > > > This is, of course, wrong. Now the function calls itself in a loop until > > it crashes. Looking at orte/mca/oob there is still a ft_event() > > function, but it is disabled using "#if 0". Looking at other functions > > it seems I would need to create something like > > > > #define ORTE_OOB_FT_EVENT(m) > > > > Looking at the modules in orte/mca/oob/ it seems ft_event is implemented > > in some places but it never seems to have any real functionality. Is > > ft_event() actually needed there? > > > > Adrian > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel