Re: [caiman-discuss] Checkpoint DOC node proposal

Karen Tung Tue, 25 May 2010 13:07:05 -0700

Hi Darren,

Thanks for the detail explanation.  I understand
we are proposing to have all checkpoint information
be stored in the DOC at all times.
This proposal does not work for supporting
resume in the engine.


The proposal calls for the application/engine to create
these checkpoint nodes for storing all the checkpoint
information, and store them in DOC upon registration time.
Then, at execute time, engine will get the list of checkpoints
from the DOC and do it's work.  This will work for the
non-resume case.  However, it will not work for the resume case.

Below is an example of a resume case that does not work.  This assume
the application registers the checkpoints via the engine,
and engine creates the checkpoint nodes and stores the
information there immediately upon registration.  The engine
will not keep a copy of what's already stored in the DOC since
it can just "look it up" when it needs the info.

* First invocation of application:

- application registers checkpoints a, b, c, d, e, these checkpoints arestored

in the DOC immediately.
- application runs all the checkpoints successfully.  The "completed"
flag in the checkpoint is set.  This implies that
all the checkpoints are resumable.
- Application exits

* 2nd invocation of application:
- application registerscheckpoints a, b, b1, b2, b3, b4, these checkpoints
are stored in the DOC immediately.
- application calls engine.restore(latest-snapshot-from-previous-run).

- At this time, the DOC is restored back to the state when the firstinvocation

of the application ends.  Information about all the checkpoints registered
during this invocation of the application is now lost!

Therefore, I have always want to keep all information about registered
checkpoints for each particular invocation of the application as *PRIVATE*
data in the engine.  This data will not be stored in the DOC unless the
engine choose to.  All the data the engine eventually decides to store
in DOC will only be used for computing what's resumable in the next
invocation of the application.  Therefore, only data regarding
successfully executed checkpoints will be stored.

Thanks,

--Karen

On 05/24/10 02:47, Darren Kenny wrote:

Hi Karen,

Comments below...

On 05/20/10 06:53 PM, Karen Tung wrote:

Hi Jean,

I am confused about the last part of this proposal.  Probably you or someone 
else
at the meeting can provide more details.  Please see below for my comments.

On 05/20/10 08:43, jean.mccormack wrote:

In order to get closure on this issue, Darren, Sarah, Alok and myself met to
resolve the issues.
We discussed the need for the transfer module to be a class with IPS/CPIO as
sub-classes. The original reason was based upon the belief that we had to
register all checkpoints at the top when we wouldn't necessarily know what
type of transfer was desired. With some thought and discussion is was decided
that we didn't have to register everything at the top so the app could, after
user input, know what type of transfer it wanted and register everything to
come at that point. This removes the create_transfer method and the transfer
class.

We discussed where target information would go in the DOC.
  The decision was that target discovery would place this object in the DOC at
a known place. If for some reason there is more than one target discovery
object, they will be stored in order.


We also discussed where the desired target to be instantiated object would go,
that was also decided to be at a known place in the DOC. If for some reason
there is more than one desired target object, the objects will be stored in
the order upon which they will be needed.

Then we talked more about the checkpoint specific information. The DOC has an
execution object which has checkpoint objects under it. Each checkpoint will
have such an object with checkpoint specific data within it.

Who creates these execution objects, and when?

The "Execution" object would only have to be created once - who creates it,
would most likely be one of two things:

1) The Application - when it wants to insert the list of checkpoints to be
    executed, then it would have to create it, if it didn't already exist.

2) ManifestParser - again, it's in the action of creating the checkpoints,
    if there are any in the manifest, it will create the Execution object,
    if it doesn't already exist - and then add the checkpoints as children.

For clarity, it may be simplest if the Application always created it first...

  The client app will do the following:
- instantiate the appropriate target/transfer objects

Are these target/transfer objects checkpoint objects or something else?

Maybe there is a little confusion here for Targets - Targets are mainly just
other pieces of data. Specifically the data to describe the following:

- "discovered" disk layout (usually generated by TD)
- a copy of this (possibly sparse - TBD) to be the "desired" disk layout,
   mainly used by TI

But, I think what was meant here, is the checkpoints, in other words:

- TargetDiscovery    - which would generate the discovered targets

- TargetIntantiation - would use the discovered, and desired targets to
                        figure out what to do.

- Transfer (Possibly multiples of these)

- store the target/transfer objects at a known place
   in the DOC *in order* of execution

 From this bullet and the following bullet point about registering
the checkpoints, it seems to hint that the target/transfer objects
are not checkpoint objects.  If that's the case, why is the *order* important?

Order is only important for the checkpoints - as far as I'm aware...

- register the checkpoints
- execute the checkpoints.
- the completed flag will be set to True in each checkpoint that
has successfully executed.

The application will set the completed flag?  That doesn't sound right.
Since the engine actually invokes the checkpoints, only the engine
would have the direct knowledge about whether a checkpoint is completed.

I would think that whatever is best able to, should be the one to set the
completed flag - in general I would think that this should be the checkpoint
itself since it's the one that should know best whether it was a successful
completion or not. I would think that the Engine should only need to look at
this flag, and skip a checkpoint if it's already completed.

Thanks,

Darren.


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Checkpoint DOC node proposal

Reply via email to