On 06/ 4/10 09:19 AM, Darren Kenny wrote:
Hi Karen,

Do we really support this?

I would think that if the base set of checkpoints changes between runs, it's not
valid to resume anything out of sequence - once you've inserted checkpoints
before one already run, you need to step back to the common point at least
before a resume could be done.

That's a basic premise of resuming things I would think, and we should discover
this before allowing such a resume.

Am I wrong here?

Hi Darren,

You are right that the use case I came up with is not valid.
Sorry about the confusion.

Even with identical checkpoint list on the 2nd invocation,
there is still a problem.  In addition to taking DOC snapshots, the engine
also takes snapshots of the install target ZFS dataset after each
checkpoint successfully completes, if the ZFS dataset is available.

Assuming we have the following checkpoints with the following functionalities:

checkpoint A: loads the manifest
checkpoint B: check whether the install target specified in the manifest exists,
                        if so, remove it.
checkpoint C: creates the install target zfs dataset
checkpoint D: copies some files into install target
checkpoint E: modifies some files in install target.
checkpoint F: modifies more files in install target.


Run 1: registered and ran checkpoints A, B, C, D, E and F
After running all the checkpoint successfully, the engine would
have created the following ZFS snapshot.

<install_target_zfs_dataset>@C
<install_target_zfs_dataset>@D
<install_target_zfs_dataset>@E
<install_target_zfs_dataset>@F

There's no snapshot for checkpoints A and B, because the
install target zfs dataset does not exist yet.
In the next invocation of the application, users should be allowed
to resume from D, E and F.

Run 2: register and run A, B, and C.  Then, register D, E and F.
User want to resume at E.  This should be allowed
normally, but this can not be done since all the snapshots are destroyed by B.

The above situation can be avoid if we do not allow any execute_checkpoint
request before resume request.

For the case of running manifest_parser, then, do the resume, it will
work.    However, to have manifest_parser as a checkpoint means
the engine will need to allow the general case of  resume after
execution of checkpoints, and that will not work all the time.

Thanks,

--Karen


Thanks,

Darren.

On 06/ 4/10 04:58 PM, Karen Tung wrote:
Hi Darren,

Thank you for sending out the example below.
The example you provided below indeed works OK.

However, to support ManifestParser being a checkpoint,
and be executed before any resume request means that
the engine will need to allow resume_execute() request
after execution has started.

The following example illustrates the problem of the
engine allowing resume after execution has started
in the general case.

Run 1 of application successfully run checkpoints A, B, C, D.
Persistent DOC will have information about these.

Run 2 of application registers checkpoints A1, B1, C1.   Requests engine
to run all of them.  They all run successfully, and persistent DOC
now have info about A1, B1, C1.  Then, application registers
checkpoints A, B, C, D, and want to
resume at checkpoint B.  Engine rollback to checkpoint B, and loose all
information
about A1, B1, C1.

As you can see, if we allow any "resume" request after
execute_checkpoint() has run,
we will run into problems.

Thanks,

--Karen

On 06/ 4/10 06:21 AM, Darren Kenny wrote:
Hi,

Just though that I'd like to mention this example after talking with Dermot
after yesterdays meeting...

I believe that one of the concerns was that on resumption of DC, and we need
to call ManifestParser again, then the data currently in the Persistent data
tree would conflict with this since it would contain the fact that
ManifestParser was already run.

I don't think that this is the case, unless you reload a snapshot, and you
cannot do that until after you've run ManifestParser - since you need to do
that to get the location of DC's work dir (/rpool/dc) from the manifest...

So I don't believe this is a problem because each time you run ManifestParser
you will be starting with an empty DOC (unless the Application puts something
in there of course).

I've tried to do this "visually" below...

Hope that resolves the issue being referred to, but if now, please feel free
to provide me with a specific scenario that I can try work through.

Thanks,

Darren.



===========================================================
      FIRST RUN

      Add ManifestParser Checkpoint

        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - EMPTY                     - ManifestParser
        |
        +--------------------------------------------------------------+

      Call Engine.execute()
      Add Application-specific checkpoints.

        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - ManifestParser              - TargetDiscovery
        |                                   - TI
        |                                   - TransferIPS
        |                                   - Transfer(s)
        |                                   - Finalizer(s)
        |
        |                                 - DC Workdir = /rpool/dc
        |
        +--------------------------------------------------------------+


      #Run up to TransferIPS
      Engine.execute(end=TransferIPS)

        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - ManifestParser              - TargetDiscovery
        |     - TargetDiscovery             - TI
        |     - TI                          - TransferIPS
        |     - TransferIPS                 - Transfer(s)
        |                                   - Finalizer(s)
        |
        |                                 - DC Workdir = /rpool/dc
        |
        +--------------------------------------------------------------+

===========================================================

Now if we stop DC, and then attempt to resume:

===========================================================
        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - EMPTY                       - ManifestParser
        |
        +--------------------------------------------------------------+

      Call Engine.execute()
      Add Application-specific checkpoints.

      Look for resume-able checkpoints using DC Workdir info...

        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - ManifestParser              - TargetDiscovery
        |                                   - TI
        |                                   - TransferIPS
        |                                   - Transfer(s)
        |                                   - Finalizer(s)
        |
        |                                 - DC Workdir = /rpool/dc
        |
        +--------------------------------------------------------------+

      Found, one, we want to resume from Transfer, so re-load snapshot from 
last run:

        +--------------------------------------------------------------+
        |
        | DOC:
        |   Persistent Data               Volatile Data
        |
        |   - Completed Checkpoints       - Checkpoints to Run
        |     - ManifestParser              - TargetDiscovery
        |     - TargetDiscovery             - TI
        |     - TI                          - TransferIPS
        |     - TransferIPS                 - Transfer(s)
        |                                   - Finalizer(s)
        |
        |                                 - DC Workdir = /rpool/dc
        |
        +--------------------------------------------------------------+

      #Now we run until the end:
      Engine.execute()

===========================================================




_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to