On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote:
> > >>> I tried to implement something like you described. It is not yet event
> > >>> driven, but before continuing I wanted to get some feedback if it is at
> > >>> least the right start:
> > >>> 
> > >>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706
> > >>> 
> > >>> I looked at the other ORTE_OOB_* macros and tried to model my
> > >>> functionality a bit after what I have seen there. Right now it is still
> > >>> a simple function which just tries to call ft_event() on all oob
> > >>> components. Does this look right so far?
> > >> 
> > >> Sorry for delay - yes, that looks like the right direction. I would 
> > >> suggest doing it via the current state machine, though, by simply 
> > >> defining another job or proc state in orte/mca/plm/plm_types.h, and then 
> > >> registering a callback function using the 
> > >> orte_state.add_job[proc]_state(state, function to be called, 
> > >> ORTE_ERR_PRI). Then you can activate it by calling 
> > >> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the 
> > >> proper order.
> > > 
> > > What is a job/proc in the Open MPI context.
> > 
> > A "job" is the entire application, while a "proc" is just one process in 
> > that application. In this case you could use either one as you are 
> > checkpointing the entire job, but all this activity is occurring inside 
> > each proc. So I'd suggest defining it as a proc state since it only really 
> > involves local actions.
> > 
> > If you like, I can define the required code in the trunk and let you fill 
> > in the event functionality.
> 
> That would be great.

Thanks for your changes. When using --with-ft there are a few compiler
errors which I tried to fix with following patch:

https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c

                Adrian

Reply via email to