Josh explained it to me a few days ago, that after a checkpoint has been
received TCP should no longer be used to not lose any messages. The
communication happens over named pipes and therefore (I think) OOB
ft_event() is used to quite anything besides the pipes. This all seems
to work but I was just confused as the functions for ft_event()
in oob/tcp and oob/ud do not seem to contain any functionality.

So do I try to fix the ft_event() function in oob/base/ to call the
registered ft_event() function which does nothing or do I just remove
the call to orte oob ft_event().

On Thu, Feb 06, 2014 at 10:49:25AM -0800, Ralph Castain wrote:
> The only reason I can think of for an OOB ft-event would be to tell the OOB 
> to stop sending any messages. You would need to push that into the event 
> library and use a callback event to let you know when it was done.
> 
> Of course, once you did that, the OOB would no longer be available to, for 
> example, tell the local daemon that the app is ready for checkpoint :-)
> 
> Afraid I'll have to defer to Josh H for any further guidance.
> 
> 
> On Feb 6, 2014, at 8:15 AM, Adrian Reber <adr...@lisas.de> wrote:
> 
> > When I initially made the C/R code compile again I made following
> > change:
> > 
> > diff --git a/orte/mca/rml/oob/rml_oob_component.c 
> > b/orte/mca/rml/oob/rml_oob_component.c
> > index f0b22fc..90ed086 100644
> > --- a/orte/mca/rml/oob/rml_oob_component.c
> > +++ b/orte/mca/rml/oob/rml_oob_component.c
> > @@ -185,8 +185,7 @@ orte_rml_oob_ft_event(int state) {
> >         ;
> >     }
> > 
> > -    if( ORTE_SUCCESS != 
> > -        (ret = orte_oob.ft_event(state)) ) {
> > +    if( ORTE_SUCCESS != (ret = orte_rml_oob_ft_event(state)) ) {
> >         ORTE_ERROR_LOG(ret);
> >         exit_status = ret;
> >         goto cleanup;
> > 
> > 
> > 
> > This is, of course, wrong. Now the function calls itself in a loop until
> > it crashes. Looking at orte/mca/oob there is still a ft_event()
> > function, but it is disabled using "#if 0". Looking at other functions
> > it seems I would need to create something like
> > 
> > #define ORTE_OOB_FT_EVENT(m)
> > 
> > Looking at the modules in orte/mca/oob/ it seems ft_event is implemented
> > in some places but it never seems to have any real functionality. Is
> > ft_event() actually needed there?
> > 
> >             Adrian
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to