On 09/24/10 05:23 PM, Keith Mitchell wrote:
>>>> Multiprocessing uses a subprocess / fork / exec / wait framework 
>>>> which gives us 100% control over the process running. It would also 
>>>> allow the engine cancel checkpoints instead of relying on the 
>>>> checkpoints themselves to implement a cancel method.
>>> That's one item we need to discuss and get other people's opinion.  
>>> With the current model of having the engine
>>> suggest the checkpoint should quit and trust that the checkpoints 
>>> will behave and quit at it's earliest convinience,
>>> the engine really has no control over whether the checkpoint.  The 
>>> good thing about doing it this way is that the checkpoint
>>> can come to a "good" stopping point and quit.  The disadvantage is of 
>>> course that the engine has no control
>>> over what a checkpoint does.
>>
>> With the MP module, and the engine controlling the checkpoint, one 
>> approach could be to roll back to the last successful dataset snapshot 
>> and end execution there.   I'm also not suggesting removing the 
>> ability of the individual checkpoints to control execution.  MP has 
>> similar IPC controls that can be used to tell the engine, "Hey!  
>> Something's wrong!  Can you stop?"
> 
> We could certainly have separate cancel() and kill() methods if we used 
> MP, which would allow for both, or add a timeout to cancel() which, if 
> the checkpoint didn't cease by that time, the engine could forcibly shut 
> it down. I definitely think the MP module is worth exploring. I think 
> the only potential for hangup is finding out how easy it is to 
> coordinate DOC updates across the separate processes, but that seems 
> like it would be surmountable.
> 
At the moment the DOC is a single instance, that is owned by the Engine - it was
never intended to be used cross-process like this - and making a change like
this could significantly change how things are implemented, since the address
space is not shared.

The 'simplest' option is to pass a copy of the DOC the the created sub-process
(using Connection.send/recv) to the sub-process while it's active, and then copy
it back when the sub-process finishes, replacing the parent's processes DOC.
While I don't actually expect that volume of data in the DOC to be huge, it
would seem excessive to pass a copy of it - but it could be done, and is aided
by the existing restrictions on the DOC implementation w.r.t. Pickle and that
people shouldn't be holding direct references to objects stored in it.

An alternative is that the DOC provides a 'remove access' API - e.g. a get/set
mechanism using paths and the passing of objects using the Connection.send/recv
methods.

Other options are some mix of proxy objects and shared memory - but no option is
really simple.

Honestly, I'm not convinced of the benefits of using the multi-process modules
in the case of the Engine since the Engine is effectively synchronous in it's
operation - running one checkpoint at a time - if it was expected to be running
many checkpoints in parallel, the I can see some value to the change and the
effort required to make the switch.

Thanks,

Darren.






_______________________________________________
caiman-discuss mailing list
caiman-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to