Re: [caiman-discuss] Use of multi-process module (was Re: Install engine code review_

Karen Tung Wed, 29 Sep 2010 12:40:24 -0700

 On 09/29/10 06:07 AM, Darren Kenny wrote:

On 09/24/10 05:23 PM, Keith Mitchell wrote:

Multiprocessing uses a subprocess / fork / exec / wait framework
which gives us 100% control over the process running. It would also
allow the engine cancel checkpoints instead of relying on the
checkpoints themselves to implement a cancel method.

That's one item we need to discuss and get other people's opinion.
With the current model of having the engine
suggest the checkpoint should quit and trust that the checkpoints
will behave and quit at it's earliest convinience,
the engine really has no control over whether the checkpoint.  The
good thing about doing it this way is that the checkpoint
can come to a "good" stopping point and quit.  The disadvantage is of
course that the engine has no control
over what a checkpoint does.

With the MP module, and the engine controlling the checkpoint, one
approach could be to roll back to the last successful dataset snapshot
and end execution there.   I'm also not suggesting removing the
ability of the individual checkpoints to control execution.  MP has
similar IPC controls that can be used to tell the engine, "Hey!
Something's wrong!  Can you stop?"

We could certainly have separate cancel() and kill() methods if we used
MP, which would allow for both, or add a timeout to cancel() which, if
the checkpoint didn't cease by that time, the engine could forcibly shut
it down. I definitely think the MP module is worth exploring. I think
the only potential for hangup is finding out how easy it is to
coordinate DOC updates across the separate processes, but that seems
like it would be surmountable.

At the moment the DOC is a single instance, that is owned by the Engine - it was
never intended to be used cross-process like this - and making a change like
this could significantly change how things are implemented, since the address
space is not shared.

The 'simplest' option is to pass a copy of the DOC the the created sub-process
(using Connection.send/recv) to the sub-process while it's active, and then copy
it back when the sub-process finishes, replacing the parent's processes DOC.
While I don't actually expect that volume of data in the DOC to be huge, it
would seem excessive to pass a copy of it - but it could be done, and is aided
by the existing restrictions on the DOC implementation w.r.t. Pickle and that
people shouldn't be holding direct references to objects stored in it.

This would work, and allow us to use the multiprocessing module to run
checkpoints.  Unfortunately, if we do it this way, we won't get the true
parallelism that is offered by using multiprocessing module.
The engine will pass a copy of the DOC into one sub-process, retrieve

the results, and pass it to the next process. If we run 2 processes inparallel,

and they both need to update the DOC, then, the engine will have to merge
the changes to the DOC some how, which will make this not
a simple solution anymore.

An alternative is that the DOC provides a 'remove access' API - e.g. a get/set
mechanism using paths and the passing of objects using the Connection.send/recv
methods.

Other options are some mix of proxy objects and shared memory - but no option is
really simple.

I was looking into the different shared memory solution provided
by the multi-processing module, and thought that the
customized shared memory manager provided
by the multiprocessing module would be simple enough to solve
the problem.

http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes
http://docs.python.org/library/multiprocessing.html#customized-managers

Unfortunately, after writing a simple test program to try it out,
I realized that we would need to create proxies for all the DOC objects and

calls for the manager. So, it is not a simple solution like I washoping for.

Honestly, I'm not convinced of the benefits of using the multi-process modules
in the case of the Engine since the Engine is effectively synchronous in it's
operation - running one checkpoint at a time - if it was expected to be running
many checkpoints in parallel, the I can see some value to the change and the
effort required to make the switch.

I agree with Darren on not trying to use the multiprocess module
at this time.  The CUD architecture relies on using the DOC to
pass data between applications and checkpoints, and between checkpoints.
To use the multiprocessing module for running checkpoints
will require non-trivial amount of work to update the engine and the DOC
so data can be shared between processes safely.

Thanks,

--Karen

_______________________________________________
caiman-discuss mailing list
caiman-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Use of multi-process module (was Re: Install engine code review_

Reply via email to