Hi Karen, On 05/25/10 09:04 PM, Karen Tung wrote: > Hi Darren, > > Thanks for the detail explanation. I understand > we are proposing to have all checkpoint information > be stored in the DOC at all times. > This proposal does not work for supporting > resume in the engine. > > The proposal calls for the application/engine to create > these checkpoint nodes for storing all the checkpoint > information, and store them in DOC upon registration time. > Then, at execute time, engine will get the list of checkpoints > from the DOC and do it's work. This will work for the > non-resume case. However, it will not work for the resume case. > > Below is an example of a resume case that does not work. This assume > the application registers the checkpoints via the engine, > and engine creates the checkpoint nodes and stores the > information there immediately upon registration. The engine > will not keep a copy of what's already stored in the DOC since > it can just "look it up" when it needs the info. > > * First invocation of application: > > - application registers checkpoints a, b, c, d, e, these checkpoints are > stored > in the DOC immediately. > - application runs all the checkpoints successfully. The "completed" > flag in the checkpoint is set. This implies that > all the checkpoints are resumable. > - Application exits > > * 2nd invocation of application: > - application registerscheckpoints a, b, b1, b2, b3, b4, these checkpoints > are stored in the DOC immediately. > - application calls engine.restore(latest-snapshot-from-previous-run). > - At this time, the DOC is restored back to the state when the first > invocation > of the application ends. Information about all the checkpoints registered > during this invocation of the application is now lost!
I'm fairly sure that I mentioned that the restore needs to be done *first*, and then the application registers the checkpoints - in this case it would have to be done as a merge, i.e. the application inserts the new checkpoints where it wants them to be. It would *never* work if you loaded a snapshot *after* putting anything in to it - it's a roll-back to that snapshot which automatically implies that any update done since that snapshot was taken will be lost. > > Therefore, I have always want to keep all information about registered > checkpoints for each particular invocation of the application as *PRIVATE* > data in the engine. This data will not be stored in the DOC unless the > engine choose to. All the data the engine eventually decides to store > in DOC will only be used for computing what's resumable in the next > invocation of the application. Therefore, only data regarding > successfully executed checkpoints will be stored. I don't agree, and see no reason for it to be separate to the DOC. Am I missing something? Thanks, Darren. _______________________________________________ caiman-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

