Hi Karen,

----- [email protected] wrote:

> On 06/ 4/10 09:19 AM, Darren Kenny wrote:
> > Hi Karen,
> >
> > Do we really support this?
> >
> > I would think that if the base set of checkpoints changes between
> runs, it's not
> > valid to resume anything out of sequence - once you've inserted
> checkpoints
> > before one already run, you need to step back to the common point at
> least
> > before a resume could be done.
> >
> > That's a basic premise of resuming things I would think, and we
> should discover
> > this before allowing such a resume.
> >
> > Am I wrong here?
> >    
> 
> Hi Darren,
> 
> You are right that the use case I came up with is not valid.
> Sorry about the confusion.

No problem... 

> 
> Even with identical checkpoint list on the 2nd invocation,
> there is still a problem.  In addition to taking DOC snapshots, the
> engine
> also takes snapshots of the install target ZFS dataset after each
> checkpoint successfully completes, if the ZFS dataset is available.
> 
> Assuming we have the following checkpoints with the following 
> functionalities:
> 
> checkpoint A: loads the manifest
> checkpoint B: check whether the install target specified in the
> manifest 
> exists,
>                          if so, remove it.
> checkpoint C: creates the install target zfs dataset
> checkpoint D: copies some files into install target
> checkpoint E: modifies some files in install target.
> checkpoint F: modifies more files in install target.
> 
> 
> Run 1: registered and ran checkpoints A, B, C, D, E and F
> After running all the checkpoint successfully, the engine would
> have created the following ZFS snapshot.
> 
> <install_target_zfs_dataset>@C
> <install_target_zfs_dataset>@D
> <install_target_zfs_dataset>@E
> <install_target_zfs_dataset>@F
> 
> There's no snapshot for checkpoints A and B, because the
> install target zfs dataset does not exist yet.
> In the next invocation of the application, users should be allowed
> to resume from D, E and F.
> 
> Run 2: register and run A, B, and C.  Then, register D, E and F.
> User want to resume at E.  This should be allowed
> normally, but this can not  be done since all the snapshots are 
> destroyed by B.
> 
> The above situation can be avoid if we do not allow any
> execute_checkpoint
> request before resume request.
> 
> For the case of running manifest_parser, then, do the resume, it will
> work.    However, to have manifest_parser as a checkpoint means
> the engine will need to allow the general case of  resume after
> execution of checkpoints, and that will not work all the time.
> 

Even though B doesn't have a zfs snapshot, it is be definition included
in the snapshot of C, since a snapshot of C would include any data in the DOC
that was generated by snapshot B. As such to resume C, D, E or F, you wouldn't 
have
to re-run B, so I don't think this is an issue. 

In general, I agree that running something like checkpoint B repeatedly could 
be a 
problem - but I also feel that this is something that an Application developer
would have to be aware of at the time of setting this sequence up since to
allow such a destructive checkpoint to re-execute would totally dismiss the 
possibility of any other checkpoint beyond it being resumed. But I really think
that this is an unlikely thing to happen validly - it's like calling free() on 
a global 
variable - you should never do that unless you are seen to "own" it, and if you
do it's most likely a bug...

Thanks,

Darren.

> Thanks,
> 
> --Karen
> 
> 
> > Thanks,
> >
> > Darren.
> >
> > On 06/ 4/10 04:58 PM, Karen Tung wrote:
> >    
> >> Hi Darren,
> >>
> >> Thank you for sending out the example below.
> >> The example you provided below indeed works OK.
> >>
> >> However, to support ManifestParser being a checkpoint,
> >> and be executed before any resume request means that
> >> the engine will need to allow resume_execute() request
> >> after execution has started.
> >>
> >> The following example illustrates the problem of the
> >> engine allowing resume after execution has started
> >> in the general case.
> >>
> >> Run 1 of application successfully run checkpoints A, B, C, D.
> >> Persistent DOC will have information about these.
> >>
> >> Run 2 of application registers checkpoints A1, B1, C1.   Requests
> engine
> >> to run all of them.  They all run successfully, and persistent DOC
> >> now have info about A1, B1, C1.  Then, application registers
> >> checkpoints A, B, C, D, and want to
> >> resume at checkpoint B.  Engine rollback to checkpoint B, and loose
> all
> >> information
> >> about A1, B1, C1.
> >>
> >> As you can see, if we allow any "resume" request after
> >> execute_checkpoint() has run,
> >> we will run into problems.
> >>
> >> Thanks,
> >>
> >> --Karen
> >>
> >> On 06/ 4/10 06:21 AM, Darren Kenny wrote:
> >>      
> >>> Hi,
> >>>
> >>> Just though that I'd like to mention this example after talking
> with Dermot
> >>> after yesterdays meeting...
> >>>
> >>> I believe that one of the concerns was that on resumption of DC,
> and we need
> >>> to call ManifestParser again, then the data currently in the
> Persistent data
> >>> tree would conflict with this since it would contain the fact
> that
> >>> ManifestParser was already run.
> >>>
> >>> I don't think that this is the case, unless you reload a snapshot,
> and you
> >>> cannot do that until after you've run ManifestParser - since you
> need to do
> >>> that to get the location of DC's work dir (/rpool/dc) from the
> manifest...
> >>>
> >>> So I don't believe this is a problem because each time you run
> ManifestParser
> >>> you will be starting with an empty DOC (unless the Application
> puts something
> >>> in there of course).
> >>>
> >>> I've tried to do this "visually" below...
> >>>
> >>> Hope that resolves the issue being referred to, but if now, please
> feel free
> >>> to provide me with a specific scenario that I can try work
> through.
> >>>
> >>> Thanks,
> >>>
> >>> Darren.
> >>>
> >>>
> >>>
> >>> ===========================================================
> >>>       FIRST RUN
> >>>
> >>>       Add ManifestParser Checkpoint
> >>>
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - EMPTY                     - ManifestParser
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>>       Call Engine.execute()
> >>>       Add Application-specific checkpoints.
> >>>
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - ManifestParser              - TargetDiscovery
> >>>         |                                   - TI
> >>>         |                                   - TransferIPS
> >>>         |                                   - Transfer(s)
> >>>         |                                   - Finalizer(s)
> >>>         |
> >>>         |                                 - DC Workdir =
> /rpool/dc
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>>
> >>>       #Run up to TransferIPS
> >>>       Engine.execute(end=TransferIPS)
> >>>
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - ManifestParser              - TargetDiscovery
> >>>         |     - TargetDiscovery             - TI
> >>>         |     - TI                          - TransferIPS
> >>>         |     - TransferIPS                 - Transfer(s)
> >>>         |                                   - Finalizer(s)
> >>>         |
> >>>         |                                 - DC Workdir =
> /rpool/dc
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>> ===========================================================
> >>>
> >>> Now if we stop DC, and then attempt to resume:
> >>>
> >>> ===========================================================
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - EMPTY                       - ManifestParser
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>>       Call Engine.execute()
> >>>       Add Application-specific checkpoints.
> >>>
> >>>       Look for resume-able checkpoints using DC Workdir info...
> >>>
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - ManifestParser              - TargetDiscovery
> >>>         |                                   - TI
> >>>         |                                   - TransferIPS
> >>>         |                                   - Transfer(s)
> >>>         |                                   - Finalizer(s)
> >>>         |
> >>>         |                                 - DC Workdir =
> /rpool/dc
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>>       Found, one, we want to resume from Transfer, so re-load
> snapshot from last run:
> >>>
> >>>        
> +--------------------------------------------------------------+
> >>>         |
> >>>         | DOC:
> >>>         |   Persistent Data               Volatile Data
> >>>         |
> >>>         |   - Completed Checkpoints       - Checkpoints to Run
> >>>         |     - ManifestParser              - TargetDiscovery
> >>>         |     - TargetDiscovery             - TI
> >>>         |     - TI                          - TransferIPS
> >>>         |     - TransferIPS                 - Transfer(s)
> >>>         |                                   - Finalizer(s)
> >>>         |
> >>>         |                                 - DC Workdir =
> /rpool/dc
> >>>         |
> >>>        
> +--------------------------------------------------------------+
> >>>
> >>>       #Now we run until the end:
> >>>       Engine.execute()
> >>>
> >>> ===========================================================
> >>>
> >>>
> >>>
> >>>        
> >>      
> 
> _______________________________________________
> caiman-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to