Hi Karen, Do we really support this?
I would think that if the base set of checkpoints changes between runs, it's not valid to resume anything out of sequence - once you've inserted checkpoints before one already run, you need to step back to the common point at least before a resume could be done. That's a basic premise of resuming things I would think, and we should discover this before allowing such a resume. Am I wrong here? Thanks, Darren. On 06/ 4/10 04:58 PM, Karen Tung wrote: > Hi Darren, > > Thank you for sending out the example below. > The example you provided below indeed works OK. > > However, to support ManifestParser being a checkpoint, > and be executed before any resume request means that > the engine will need to allow resume_execute() request > after execution has started. > > The following example illustrates the problem of the > engine allowing resume after execution has started > in the general case. > > Run 1 of application successfully run checkpoints A, B, C, D. > Persistent DOC will have information about these. > > Run 2 of application registers checkpoints A1, B1, C1. Requests engine > to run all of them. They all run successfully, and persistent DOC > now have info about A1, B1, C1. Then, application registers > checkpoints A, B, C, D, and want to > resume at checkpoint B. Engine rollback to checkpoint B, and loose all > information > about A1, B1, C1. > > As you can see, if we allow any "resume" request after > execute_checkpoint() has run, > we will run into problems. > > Thanks, > > --Karen > > On 06/ 4/10 06:21 AM, Darren Kenny wrote: >> Hi, >> >> Just though that I'd like to mention this example after talking with Dermot >> after yesterdays meeting... >> >> I believe that one of the concerns was that on resumption of DC, and we need >> to call ManifestParser again, then the data currently in the Persistent data >> tree would conflict with this since it would contain the fact that >> ManifestParser was already run. >> >> I don't think that this is the case, unless you reload a snapshot, and you >> cannot do that until after you've run ManifestParser - since you need to do >> that to get the location of DC's work dir (/rpool/dc) from the manifest... >> >> So I don't believe this is a problem because each time you run ManifestParser >> you will be starting with an empty DOC (unless the Application puts something >> in there of course). >> >> I've tried to do this "visually" below... >> >> Hope that resolves the issue being referred to, but if now, please feel free >> to provide me with a specific scenario that I can try work through. >> >> Thanks, >> >> Darren. >> >> >> >> =========================================================== >> FIRST RUN >> >> Add ManifestParser Checkpoint >> >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - EMPTY - ManifestParser >> | >> +--------------------------------------------------------------+ >> >> Call Engine.execute() >> Add Application-specific checkpoints. >> >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - ManifestParser - TargetDiscovery >> | - TI >> | - TransferIPS >> | - Transfer(s) >> | - Finalizer(s) >> | >> | - DC Workdir = /rpool/dc >> | >> +--------------------------------------------------------------+ >> >> >> #Run up to TransferIPS >> Engine.execute(end=TransferIPS) >> >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - ManifestParser - TargetDiscovery >> | - TargetDiscovery - TI >> | - TI - TransferIPS >> | - TransferIPS - Transfer(s) >> | - Finalizer(s) >> | >> | - DC Workdir = /rpool/dc >> | >> +--------------------------------------------------------------+ >> >> =========================================================== >> >> Now if we stop DC, and then attempt to resume: >> >> =========================================================== >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - EMPTY - ManifestParser >> | >> +--------------------------------------------------------------+ >> >> Call Engine.execute() >> Add Application-specific checkpoints. >> >> Look for resume-able checkpoints using DC Workdir info... >> >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - ManifestParser - TargetDiscovery >> | - TI >> | - TransferIPS >> | - Transfer(s) >> | - Finalizer(s) >> | >> | - DC Workdir = /rpool/dc >> | >> +--------------------------------------------------------------+ >> >> Found, one, we want to resume from Transfer, so re-load snapshot from >> last run: >> >> +--------------------------------------------------------------+ >> | >> | DOC: >> | Persistent Data Volatile Data >> | >> | - Completed Checkpoints - Checkpoints to Run >> | - ManifestParser - TargetDiscovery >> | - TargetDiscovery - TI >> | - TI - TransferIPS >> | - TransferIPS - Transfer(s) >> | - Finalizer(s) >> | >> | - DC Workdir = /rpool/dc >> | >> +--------------------------------------------------------------+ >> >> #Now we run until the end: >> Engine.execute() >> >> =========================================================== >> >> >> > _______________________________________________ caiman-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

