Hi Karen, ----- [email protected] wrote:
> On 06/ 4/10 09:19 AM, Darren Kenny wrote: > > Hi Karen, > > > > Do we really support this? > > > > I would think that if the base set of checkpoints changes between > runs, it's not > > valid to resume anything out of sequence - once you've inserted > checkpoints > > before one already run, you need to step back to the common point at > least > > before a resume could be done. > > > > That's a basic premise of resuming things I would think, and we > should discover > > this before allowing such a resume. > > > > Am I wrong here? > > > > Hi Darren, > > You are right that the use case I came up with is not valid. > Sorry about the confusion. No problem... > > Even with identical checkpoint list on the 2nd invocation, > there is still a problem. In addition to taking DOC snapshots, the > engine > also takes snapshots of the install target ZFS dataset after each > checkpoint successfully completes, if the ZFS dataset is available. > > Assuming we have the following checkpoints with the following > functionalities: > > checkpoint A: loads the manifest > checkpoint B: check whether the install target specified in the > manifest > exists, > if so, remove it. > checkpoint C: creates the install target zfs dataset > checkpoint D: copies some files into install target > checkpoint E: modifies some files in install target. > checkpoint F: modifies more files in install target. > > > Run 1: registered and ran checkpoints A, B, C, D, E and F > After running all the checkpoint successfully, the engine would > have created the following ZFS snapshot. > > <install_target_zfs_dataset>@C > <install_target_zfs_dataset>@D > <install_target_zfs_dataset>@E > <install_target_zfs_dataset>@F > > There's no snapshot for checkpoints A and B, because the > install target zfs dataset does not exist yet. > In the next invocation of the application, users should be allowed > to resume from D, E and F. > > Run 2: register and run A, B, and C. Then, register D, E and F. > User want to resume at E. This should be allowed > normally, but this can not be done since all the snapshots are > destroyed by B. > > The above situation can be avoid if we do not allow any > execute_checkpoint > request before resume request. > > For the case of running manifest_parser, then, do the resume, it will > work. However, to have manifest_parser as a checkpoint means > the engine will need to allow the general case of resume after > execution of checkpoints, and that will not work all the time. > Even though B doesn't have a zfs snapshot, it is be definition included in the snapshot of C, since a snapshot of C would include any data in the DOC that was generated by snapshot B. As such to resume C, D, E or F, you wouldn't have to re-run B, so I don't think this is an issue. In general, I agree that running something like checkpoint B repeatedly could be a problem - but I also feel that this is something that an Application developer would have to be aware of at the time of setting this sequence up since to allow such a destructive checkpoint to re-execute would totally dismiss the possibility of any other checkpoint beyond it being resumed. But I really think that this is an unlikely thing to happen validly - it's like calling free() on a global variable - you should never do that unless you are seen to "own" it, and if you do it's most likely a bug... Thanks, Darren. > Thanks, > > --Karen > > > > Thanks, > > > > Darren. > > > > On 06/ 4/10 04:58 PM, Karen Tung wrote: > > > >> Hi Darren, > >> > >> Thank you for sending out the example below. > >> The example you provided below indeed works OK. > >> > >> However, to support ManifestParser being a checkpoint, > >> and be executed before any resume request means that > >> the engine will need to allow resume_execute() request > >> after execution has started. > >> > >> The following example illustrates the problem of the > >> engine allowing resume after execution has started > >> in the general case. > >> > >> Run 1 of application successfully run checkpoints A, B, C, D. > >> Persistent DOC will have information about these. > >> > >> Run 2 of application registers checkpoints A1, B1, C1. Requests > engine > >> to run all of them. They all run successfully, and persistent DOC > >> now have info about A1, B1, C1. Then, application registers > >> checkpoints A, B, C, D, and want to > >> resume at checkpoint B. Engine rollback to checkpoint B, and loose > all > >> information > >> about A1, B1, C1. > >> > >> As you can see, if we allow any "resume" request after > >> execute_checkpoint() has run, > >> we will run into problems. > >> > >> Thanks, > >> > >> --Karen > >> > >> On 06/ 4/10 06:21 AM, Darren Kenny wrote: > >> > >>> Hi, > >>> > >>> Just though that I'd like to mention this example after talking > with Dermot > >>> after yesterdays meeting... > >>> > >>> I believe that one of the concerns was that on resumption of DC, > and we need > >>> to call ManifestParser again, then the data currently in the > Persistent data > >>> tree would conflict with this since it would contain the fact > that > >>> ManifestParser was already run. > >>> > >>> I don't think that this is the case, unless you reload a snapshot, > and you > >>> cannot do that until after you've run ManifestParser - since you > need to do > >>> that to get the location of DC's work dir (/rpool/dc) from the > manifest... > >>> > >>> So I don't believe this is a problem because each time you run > ManifestParser > >>> you will be starting with an empty DOC (unless the Application > puts something > >>> in there of course). > >>> > >>> I've tried to do this "visually" below... > >>> > >>> Hope that resolves the issue being referred to, but if now, please > feel free > >>> to provide me with a specific scenario that I can try work > through. > >>> > >>> Thanks, > >>> > >>> Darren. > >>> > >>> > >>> > >>> =========================================================== > >>> FIRST RUN > >>> > >>> Add ManifestParser Checkpoint > >>> > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - EMPTY - ManifestParser > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> Call Engine.execute() > >>> Add Application-specific checkpoints. > >>> > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - ManifestParser - TargetDiscovery > >>> | - TI > >>> | - TransferIPS > >>> | - Transfer(s) > >>> | - Finalizer(s) > >>> | > >>> | - DC Workdir = > /rpool/dc > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> > >>> #Run up to TransferIPS > >>> Engine.execute(end=TransferIPS) > >>> > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - ManifestParser - TargetDiscovery > >>> | - TargetDiscovery - TI > >>> | - TI - TransferIPS > >>> | - TransferIPS - Transfer(s) > >>> | - Finalizer(s) > >>> | > >>> | - DC Workdir = > /rpool/dc > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> =========================================================== > >>> > >>> Now if we stop DC, and then attempt to resume: > >>> > >>> =========================================================== > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - EMPTY - ManifestParser > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> Call Engine.execute() > >>> Add Application-specific checkpoints. > >>> > >>> Look for resume-able checkpoints using DC Workdir info... > >>> > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - ManifestParser - TargetDiscovery > >>> | - TI > >>> | - TransferIPS > >>> | - Transfer(s) > >>> | - Finalizer(s) > >>> | > >>> | - DC Workdir = > /rpool/dc > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> Found, one, we want to resume from Transfer, so re-load > snapshot from last run: > >>> > >>> > +--------------------------------------------------------------+ > >>> | > >>> | DOC: > >>> | Persistent Data Volatile Data > >>> | > >>> | - Completed Checkpoints - Checkpoints to Run > >>> | - ManifestParser - TargetDiscovery > >>> | - TargetDiscovery - TI > >>> | - TI - TransferIPS > >>> | - TransferIPS - Transfer(s) > >>> | - Finalizer(s) > >>> | > >>> | - DC Workdir = > /rpool/dc > >>> | > >>> > +--------------------------------------------------------------+ > >>> > >>> #Now we run until the end: > >>> Engine.execute() > >>> > >>> =========================================================== > >>> > >>> > >>> > >>> > >> > > _______________________________________________ > caiman-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss _______________________________________________ caiman-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

