Multiprocessing uses a subprocess / fork / exec / wait framework which gives us 100% control over the process running. It would also allow the engine cancel checkpoints instead of relying on the checkpoints themselves to implement a cancel method.
That's one item we need to discuss and get other people's opinion. With the current model of having the engine suggest the checkpoint should quit and trust that the checkpoints will behave and quit at it's earliest convinience, the engine really has no control over whether the checkpoint. The good thing about doing it this way is that the checkpoint can come to a "good" stopping point and quit. The disadvantage is of course that the engine has no control
over what a checkpoint does.

With the MP module, and the engine controlling the checkpoint, one approach could be to roll back to the last successful dataset snapshot and end execution there. I'm also not suggesting removing the ability of the individual checkpoints to control execution. MP has similar IPC controls that can be used to tell the engine, "Hey! Something's wrong! Can you stop?"

We could certainly have separate cancel() and kill() methods if we used MP, which would allow for both, or add a timeout to cancel() which, if the checkpoint didn't cease by that time, the engine could forcibly shut it down. I definitely think the MP module is worth exploring. I think the only potential for hangup is finding out how easy it is to coordinate DOC updates across the separate processes, but that seems like it would be surmountable.

- Keith
_______________________________________________
caiman-discuss mailing list
caiman-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to