Hi Darren,

Further comments inline.

On 06/ 8/10 05:26 AM, Darren Kenny wrote:
Hi Karen,

----- [email protected] wrote:

On 06/ 4/10 09:19 AM, Darren Kenny wrote:
Hi Karen,

Do we really support this?

I would think that if the base set of checkpoints changes between
runs, it's not
valid to resume anything out of sequence - once you've inserted
checkpoints
before one already run, you need to step back to the common point at
least
before a resume could be done.

That's a basic premise of resuming things I would think, and we
should discover
this before allowing such a resume.

Am I wrong here?

Hi Darren,

You are right that the use case I came up with is not valid.
Sorry about the confusion.
No problem...

Even with identical checkpoint list on the 2nd invocation,
there is still a problem.  In addition to taking DOC snapshots, the
engine
also takes snapshots of the install target ZFS dataset after each
checkpoint successfully completes, if the ZFS dataset is available.

Assuming we have the following checkpoints with the following
functionalities:

checkpoint A: loads the manifest
checkpoint B: check whether the install target specified in the
manifest
exists,
                          if so, remove it.
checkpoint C: creates the install target zfs dataset
checkpoint D: copies some files into install target
checkpoint E: modifies some files in install target.
checkpoint F: modifies more files in install target.


Run 1: registered and ran checkpoints A, B, C, D, E and F
After running all the checkpoint successfully, the engine would
have created the following ZFS snapshot.

<install_target_zfs_dataset>@C
<install_target_zfs_dataset>@D
<install_target_zfs_dataset>@E
<install_target_zfs_dataset>@F

There's no snapshot for checkpoints A and B, because the
install target zfs dataset does not exist yet.
In the next invocation of the application, users should be allowed
to resume from D, E and F.

Run 2: register and run A, B, and C.  Then, register D, E and F.
User want to resume at E.  This should be allowed
normally, but this can not  be done since all the snapshots are
destroyed by B.

The above situation can be avoid if we do not allow any
execute_checkpoint
request before resume request.

For the case of running manifest_parser, then, do the resume, it will
work.    However, to have manifest_parser as a checkpoint means
the engine will need to allow the general case of  resume after
execution of checkpoints, and that will not work all the time.

Even though B doesn't have a zfs snapshot, it is be definition included
in the snapshot of C, since a snapshot of C would include any data in the DOC
that was generated by snapshot B. As such to resume C, D, E or F, you wouldn't 
have
to re-run B, so I don't think this is an issue.
It's true that a DOC snapshot taken after C finishes would include
information about B.  That snapshot even includes information about
A too.  However, in order to resume at C, we must have a snapshot
that does not include information about C.  That way, we can run
C again, and take a snapshot.  Since we don't have
a ZFS dataset to store the DOC snapshot taken after B completes, we
really can not resume at C.
In general, I agree that running something like checkpoint B repeatedly could 
be a
problem - but I also feel that this is something that an Application developer
would have to be aware of at the time of setting this sequence up since to
allow such a destructive checkpoint to re-execute would totally dismiss the
possibility of any other checkpoint beyond it being resumed. But I really think
that this is an unlikely thing to happen validly - it's like calling free() on 
a global
variable - you should never do that unless you are seen to "own" it, and if you
do it's most likely a bug...
It's true that an Application developer shouldn't try to run destructive things like that. However, regardless what the application developer does, the engine
should always present consistent behavior.  If we don't allow resume after
execute starts, we would have prevented this inconsistent behavior.

So, going back to the original problem that brings up this discussion.

We want to run manifest-parser as a checkpoint.  In order to allow that,
we would have to make the general policy that the engine would allow
resume request after execution has begun.  This policy works
OK sometimes, but could cause problem for some use cases.

IMO, even if there's no problem with doing resume after execution has
begun, I feel that the data from a manifest is part of the input to
DOC initialization, so, the manifest should probably be passed as an
optional argument when we instantiate the DOC.

Thanks,

--Karen


Thanks,

Darren.

Thanks,

--Karen


Thanks,

Darren.

On 06/ 4/10 04:58 PM, Karen Tung wrote:

Hi Darren,

Thank you for sending out the example below.
The example you provided below indeed works OK.

However, to support ManifestParser being a checkpoint,
and be executed before any resume request means that
the engine will need to allow resume_execute() request
after execution has started.

The following example illustrates the problem of the
engine allowing resume after execution has started
in the general case.

Run 1 of application successfully run checkpoints A, B, C, D.
Persistent DOC will have information about these.

Run 2 of application registers checkpoints A1, B1, C1.   Requests
engine
to run all of them.  They all run successfully, and persistent DOC
now have info about A1, B1, C1.  Then, application registers
checkpoints A, B, C, D, and want to
resume at checkpoint B.  Engine rollback to checkpoint B, and loose
all
information
about A1, B1, C1.

As you can see, if we allow any "resume" request after
execute_checkpoint() has run,
we will run into problems.

Thanks,

--Karen

On 06/ 4/10 06:21 AM, Darren Kenny wrote:

Hi,

Just though that I'd like to mention this example after talking
with Dermot
after yesterdays meeting...

I believe that one of the concerns was that on resumption of DC,
and we need
to call ManifestParser again, then the data currently in the
Persistent data
tree would conflict with this since it would contain the fact
that
ManifestParser was already run.

I don't think that this is the case, unless you reload a snapshot,
and you
cannot do that until after you've run ManifestParser - since you
need to do
that to get the location of DC's work dir (/rpool/dc) from the
manifest...
So I don't believe this is a problem because each time you run
ManifestParser
you will be starting with an empty DOC (unless the Application
puts something
in there of course).

I've tried to do this "visually" below...

Hope that resolves the issue being referred to, but if now, please
feel free
to provide me with a specific scenario that I can try work
through.
Thanks,

Darren.



===========================================================
       FIRST RUN

       Add ManifestParser Checkpoint


+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - EMPTY                     - ManifestParser
         |

+--------------------------------------------------------------+
       Call Engine.execute()
       Add Application-specific checkpoints.


+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - ManifestParser              - TargetDiscovery
         |                                   - TI
         |                                   - TransferIPS
         |                                   - Transfer(s)
         |                                   - Finalizer(s)
         |
         |                                 - DC Workdir =
/rpool/dc
         |

+--------------------------------------------------------------+

       #Run up to TransferIPS
       Engine.execute(end=TransferIPS)


+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - ManifestParser              - TargetDiscovery
         |     - TargetDiscovery             - TI
         |     - TI                          - TransferIPS
         |     - TransferIPS                 - Transfer(s)
         |                                   - Finalizer(s)
         |
         |                                 - DC Workdir =
/rpool/dc
         |

+--------------------------------------------------------------+
===========================================================

Now if we stop DC, and then attempt to resume:

===========================================================

+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - EMPTY                       - ManifestParser
         |

+--------------------------------------------------------------+
       Call Engine.execute()
       Add Application-specific checkpoints.

       Look for resume-able checkpoints using DC Workdir info...


+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - ManifestParser              - TargetDiscovery
         |                                   - TI
         |                                   - TransferIPS
         |                                   - Transfer(s)
         |                                   - Finalizer(s)
         |
         |                                 - DC Workdir =
/rpool/dc
         |

+--------------------------------------------------------------+
       Found, one, we want to resume from Transfer, so re-load
snapshot from last run:

+--------------------------------------------------------------+
         |
         | DOC:
         |   Persistent Data               Volatile Data
         |
         |   - Completed Checkpoints       - Checkpoints to Run
         |     - ManifestParser              - TargetDiscovery
         |     - TargetDiscovery             - TI
         |     - TI                          - TransferIPS
         |     - TransferIPS                 - Transfer(s)
         |                                   - Finalizer(s)
         |
         |                                 - DC Workdir =
/rpool/dc
         |

+--------------------------------------------------------------+
       #Now we run until the end:
       Engine.execute()

===========================================================





_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to