Re: [caiman-discuss] Install Engine Design Document review

Karen Tung Wed, 09 Jun 2010 15:27:16 -0700

Hi Sarah,

Thank you so much for reviewing the document.  My responses are inline.


On 06/08/10 10:57, Sarah Jelinek wrote:

Hi Karen,
This is a really good design document. I do have somecomments/questions. I tried not to repeat others comments, but if Idid, simply refer me to your response emails.
Section 3:
Data Object Cache is a singleton. All components access theDataObjectCache directly withoutgoing through the engine. The engine initialize the DataObjectCache.No other components
should initialize the DataObjectCache.
I don't believe this is a correct assumption. As a result of it notbeing a singleton, it is likely to change some of the interfaces youneed to provide in the Engine.

I will update the document removing the assumption that DataObjectCachewill be a singleton.In terms of providing interface in the engine to reference the dataobject cache, I don't thinkwe need to have a special function. After applications get a referenceto the engine singleton,

they can simply use engine.doc.   I will mention this in the document.

Even though the engine runs the checkpoints, it does not report theirprogress. Applicationsthat wants to monitor progress should register one or more progressreceiver with the logger.Checkpoints report their progress directly to the logger. The enginehelps to normalize theprogress information for the logger. The install logger publishesprogress information received
from the checkpoints to all registered progress receivers.
So, in looking at your uml diagram it looks like the logger has tocall to the engine for the normalize_progress() method. Why then do wenot have consumers log using the engine and have the engine normalizeit before sending it to the progress logger? It seems backwards to methat the logger has to call the engine to normalize this data, thensend the normalized progress to the progress receiver.

We want to have the InstallLogger be responsible for all progressreporting. Applications register

progress receiver with the logger, so, I think it is more natural

to have checkpoints report their progress to the logger also. That way,we present the conceptthat the logger is the middle man that receives and sends the progressinformation.


We did this during the prototype, and it works well.

Section 6:
Other things we need to allow for with regard to interaction with theengine are:-Getting DOC object handle for consumers. this is based on the DOC notbeing a singleton.

I will add information in the document on how the DOC can be accessedvia the engine.

-Getting a list of checkpoints registered.

Why do we need this? When we were reviewing the architecture document,I asked abouthaving the engine provide such an interface. Your response is thatsince the applicationregistered the checkpoints, they would already have the list ofcheckpoints registered.

It makes a lot of sense to me.  So, I didn't have such an interface.

-unregister checkpoints

I can not think of a use case of needing to unregister checkpoints. Isit really necessary?If we were to provide the interface, I think only un-executedcheckpoints can be unregistered.

Checkpoint that have been executed can not be.

Section 6.2.1:
loglevel: I thought that we wanted to be able to set log level percheckpoint as well? Is this still the case? If it is, why do we have alog level for the Engine as well?

Yes, we want to set log level per checkpoint. To set a special loglevel for the checkpoint, the level is providedat registration time of that checkpoint. The log level specified in the__init__() function will be the log level used

for the entire application/engine/checkpoints...etc..

Debug: so, if set to false, when are the DOC snapshots removed from/tmp? My question is how do we know if we want them removed if we areplanning on stopping and resuming?

DOC snapshots will be removed from /tmp when the application exits.Section 6.7.1 talks about the __del__() function, whichwill be called automatically for "destruction" of the engine objectinstance.

Stopping and resuming is only allowed out-of-process for thosecheckpoints that have DOC snapshots stored

in the ZFS install target.

Before ZFS install target is available, DOC snapshots are stored in/tmp. After ZFS install target is available, DOC

snapshots are stored in the ZFS install target.

6.3.2: ordering of registered checkpoints
Why can't an application insert a new checkpoint in front of analready executed checkpoint? It seems as if this happens, and weresume from this newly inserted checkpoint, then any subsequentcheckpoint has to be re-run, regardless of its current state. Whichmeans the engine must cleanup the previous run checkpoint's snapshots.I can see that it might be useful to allow an application, or user, tospecify a new checkpoint that doesn't follow in order of the lastexecuted checkpoint, but before with the intent of starting from thenewly inserted checkpoint.

We could allow inserting a new checkpoint in front of an alreadyexecuted checkpoint

he way you suggested above, but it is very confusing, in my opinion.

For example,

Application registered A, B, C, D, E, F, and executed A, B, C, D.
So, naturally, the application can resume from A, B, C, or D.

If they just want to continue where they left off last time, they willstart from E.

If we allow register a checkpoint A1 in the middle of executedcheckpoints, say,

in the middle of A and B.

Now, if I want to continue from previous execution, I would "continue"from A1,

re-executing B, C, D....  That's very confusing, I think.
Also, what about if people want to resume from checkpoint C.  That will be
invalid now.

I couldn't think of a use case where allowing insertion of a checkpointin front ofalready executed checkpoints to be useful, in the same invocation of anapplication.


Also, just to be clear, you *CAN* insert in front of an already executed
checkpoints during subsequent invocation of the application.
When you run the app again, and you registered a set of checkpoints
with the "new" one inserted in the middle of checkpoints executed
in previous invocation of the app.  You will get exactly the same behavior
you described above.

Section 6.3.4:
I would think that the input args to register_checkpoint could be adictionary or something like that,rather than individual args. Muchlike an nvlist allows for extensibility in the design, we could dosomething similar here, in case we ever want or need to change theparameters for registering a checkpoint. I am referring specificallyto the checkpoint_name, module_path, checkpoint_obj, insert_before andlog_level args.

Do you mean to use keyword arguments such as checkpoint_name=xxx,module_path=xxx..etc..?Or do you mean to specify everything in a dictionary, and then, pass thewhole dictionary as one

single argument?

Either way, I think it is not as good as specifying it the way it iscurrently proposed.

The approach of specifying them as keyword arguments is not very Pythonstyle.My observation in python is that all required arguments are explicitlyspelled out.Then, optional arguments are specified as keyword arguments, with thedefault

value specified.

I feel the approach of specifying them as a single dictionary to be veryconfusing.You would first build the dictionary separately, and then, pass it in.Additionally, passingit in as a dictionary would require the function to check that all therequiredarguments are specified. Furthermore, people will not be able to seethe default

value of optional arguments.

I understand the desire for extensibility in the future. Since we have*args and **kwargs passedinto the register function as well, it is up to the registrationfunction to interprethowever many args and kwargs. All "un-used" args and kwargs are simplypassed

to the constructor of the checkpoint class.

Section 6.5.1: cancel_checkpoint():
It seems that with python's threading model, that is there is no killmethod for threads, that the cancel function is going to perhaps set aflag in the checkpoint object, and the execute() method on that objectwill have to poll and wait on a condition to stop themselves. I wouldthink we would have to spell out in your interface description whathas to be done for both the execute() and cancel() methods soimplementers know how to make use of the cancel functionality.

It's true that there's no kill method for threads. It is actually up tothe checkpoint developer to decide how they want toimplement the cancel(). They can set a flag when cancel is called andhave execute() check that flag. Or they can chooseto do nothing when cancel is called, if the checkpoint is very small.In which case, we just need to wait for execute() to finish.


I will add an example on how people can implement cancel().

Also, since it is up to the engine to decide which thread to cancel,how can the consumer know where it can resume, since it may not knowwhat was currently executing?

Good point. Perhaps cancel_checkpoint() can return the name of thecheckpoint that's canceled?If the application just want to continue executing all un-executedcheckpoints. It can call execute_checkpoint()

with no start_from value.

A general comment on the format of this design doc.. I found it hardto figure out what objects the methods were associated with because ofthe way you laid out the document. Perhaps describing the installengine class at first, with all the named methods, and any otherclasses such as the checkpoint class and all its methods, and thendescribing them would make it easier to understand.

All methods are supposed to only associate with the InstallEngineclass. I can move the Interface Table up, so, all the interfacesare shown first, and then, it gets discussed in detail later. Do youthink that's better?

The way I have it now was meant to summarize all the interfaces
that's presented in the document.

Section 7.4: resume_execute_checkpoint, using a dictionary orsomething similar as input args like suggested forexecute_checkpoint() would make this more extensible, imo.

Same reply as the argument list for register_checkpoint() above.

Section 8: Creation of the checkpoint data subtree.. in my proposedschema the checkpoints would be stored at the time, in the desiredsystem state part of the DOC, at the time we read these from themanifest. how does this work in terms of your storing the checkpointslater if they were executed successfully only? Maybe not an issue,just trying to understand why we need this Engine private data.

I believe the checkpoints proposed in your schema proposal will bevalues that's used by the application. The application

will query those information from the manifest, and use those information

for the register_checkpoint() call. The engine does not get anycheckpoint information from the DOC

that's not stored in there by itself.  The checkpoint data subtree is
meant to store bookkeeping data for the engine to support stop and resume.

Since we can only resume from previously successfully executedcheckpoints, we only need to store

data about those.

Thanks for reviewing again.

--Karen


thanks,
sarah


On 05/28/10 04:12 PM, Karen Tung wrote:

Hi,

The draft install engine design doc is ready for review.
I pushed the doc to the caiman-docs repo.  You can
find them under the install_engine directory in
ssh://[email protected]/hg/caiman/caiman-docs

If you don't want to clone the gate, you can also access
the file directly here:

http://cvs.opensolaris.org/source/xref/caiman/caiman-docs/install_engine

You will find 2 files in the directory:
* engine-design-doc.odt is the design document

* engine-sequence-diag-0528.png is the sequence diagram inserted intochapter 14 of the document. You might want to see the bigger imagedirectly.


Please send your feedback by 6/11/2010.

If you plan to review the document, please send me an email privately,
and preferably also let me know when you will complete the review, so
I can plan accordingly and ensure the document is reviewed thoroughly.

Thanks,

--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Install Engine Design Document review

Reply via email to