On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote: > > >>> I tried to implement something like you described. It is not yet event > > >>> driven, but before continuing I wanted to get some feedback if it is at > > >>> least the right start: > > >>> > > >>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706 > > >>> > > >>> I looked at the other ORTE_OOB_* macros and tried to model my > > >>> functionality a bit after what I have seen there. Right now it is still > > >>> a simple function which just tries to call ft_event() on all oob > > >>> components. Does this look right so far? > > >> > > >> Sorry for delay - yes, that looks like the right direction. I would > > >> suggest doing it via the current state machine, though, by simply > > >> defining another job or proc state in orte/mca/plm/plm_types.h, and then > > >> registering a callback function using the > > >> orte_state.add_job[proc]_state(state, function to be called, > > >> ORTE_ERR_PRI). Then you can activate it by calling > > >> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the > > >> proper order. > > > > > > What is a job/proc in the Open MPI context. > > > > A "job" is the entire application, while a "proc" is just one process in > > that application. In this case you could use either one as you are > > checkpointing the entire job, but all this activity is occurring inside > > each proc. So I'd suggest defining it as a proc state since it only really > > involves local actions. > > > > If you like, I can define the required code in the trunk and let you fill > > in the event functionality. > > That would be great.
Thanks for your changes. When using --with-ft there are a few compiler errors which I tried to fix with following patch: https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c Adrian