Re: [caiman-discuss] Checkpoint DOC node proposal

Darren Kenny Wed, 26 May 2010 04:23:19 -0700

Hi Karen,

On 05/25/10 09:04 PM, Karen Tung wrote:
> Hi Darren,
> 
> Thanks for the detail explanation.  I understand
> we are proposing to have all checkpoint information
> be stored in the DOC at all times.
> This proposal does not work for supporting
> resume in the engine.
> 
> The proposal calls for the application/engine to create
> these checkpoint nodes for storing all the checkpoint
> information, and store them in DOC upon registration time.
> Then, at execute time, engine will get the list of checkpoints
> from the DOC and do it's work.  This will work for the
> non-resume case.  However, it will not work for the resume case.
> 
> Below is an example of a resume case that does not work.  This assume
> the application registers the checkpoints via the engine,
> and engine creates the checkpoint nodes and stores the
> information there immediately upon registration.  The engine
> will not keep a copy of what's already stored in the DOC since
> it can just "look it up" when it needs the info.
> 
> * First invocation of application:
> 
> - application registers checkpoints a, b, c, d, e, these checkpoints are 
> stored
> in the DOC immediately.
> - application runs all the checkpoints successfully.  The "completed"
> flag in the checkpoint is set.  This implies that
> all the checkpoints are resumable.
> - Application exits
> 
> * 2nd invocation of application:
> - application registerscheckpoints a, b, b1, b2, b3, b4, these checkpoints
> are stored in the DOC immediately.
> - application calls engine.restore(latest-snapshot-from-previous-run).
> - At this time, the DOC is restored back to the state when the first 
> invocation
> of the application ends.  Information about all the checkpoints registered
> during this invocation of the application is now lost!


I'm fairly sure that I mentioned that the restore needs to be done *first*, and
then the application registers the checkpoints - in this case it would have to
be done as a merge, i.e. the application inserts the new checkpoints where it
wants them to be.

It would *never* work if you loaded a snapshot *after* putting anything in to it
- it's a roll-back to that snapshot which automatically implies that any update
done since that snapshot was taken will be lost.

> 
> Therefore, I have always want to keep all information about registered
> checkpoints for each particular invocation of the application as *PRIVATE*
> data in the engine.  This data will not be stored in the DOC unless the
> engine choose to.  All the data the engine eventually decides to store
> in DOC will only be used for computing what's resumable in the next
> invocation of the application.  Therefore, only data regarding
> successfully executed checkpoints will be stored.

I don't agree, and see no reason for it to be separate to the DOC.

Am I missing something?

Thanks,

Darren.

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Checkpoint DOC node proposal

Reply via email to