On Mar 7, 2014, at 3:07 AM, Adrian Reber <adr...@lisas.de> wrote:

> On Thu, Mar 06, 2014 at 07:47:22PM -0800, Ralph Castain wrote:
>>>>>>> Sorry for delay - yes, that looks like the right direction. I would 
>>>>>>> suggest doing it via the current state machine, though, by simply 
>>>>>>> defining another job or proc state in orte/mca/plm/plm_types.h, and 
>>>>>>> then registering a callback function using the 
>>>>>>> orte_state.add_job[proc]_state(state, function to be called, 
>>>>>>> ORTE_ERR_PRI). Then you can activate it by calling 
>>>>>>> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in 
>>>>>>> the proper order.
>>>>>> 
>>>>>> What is a job/proc in the Open MPI context.
>>>>> 
>>>>> A "job" is the entire application, while a "proc" is just one process in 
>>>>> that application. In this case you could use either one as you are 
>>>>> checkpointing the entire job, but all this activity is occurring inside 
>>>>> each proc. So I'd suggest defining it as a proc state since it only 
>>>>> really involves local actions.
>>>>> 
>>>>> If you like, I can define the required code in the trunk and let you fill 
>>>>> in the event functionality.
>>>> 
>>>> That would be great.
>>> 
>>> Thanks for your changes. When using --with-ft there are a few compiler
>>> errors which I tried to fix with following patch:
>>> 
>>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c
>> 
>> That looks okay, with the only caveat being that you wouldn't ordinarily 
>> pass the state_caddy_t into a function. It's just there to pass along the 
>> job etc in case the callback function needs to reference something. In this 
>> case, I can't think of anything the FT event function would need to know - 
>> you just want it to quiet all messaging.
> 
> I need to pass the type of state to the ft_event() functions:
> 
> enum opal_crs_state_type_t {
>    OPAL_CRS_NONE        = 0,
>    OPAL_CRS_CHECKPOINT  = 1,
>    OPAL_CRS_RESTART_PRE = 2,
>    OPAL_CRS_RESTART     = 3, /* RESTART_POST */
> 
> so an int is all I need. So I probably need to encode it into *cbdata. Do I
> just use an int directly in *cbdata or should it be part of a struct?

Why don't you define a job state for each of those, and then you can walk the 
state machine thru them if needed? That way the state caddy will already 
provide you with the state and you can just pass it to the functions.

> 
>               Adrian
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/03/14311.php

Reply via email to