Hi Sarah,

Thank you so much for reviewing the document.  My responses are inline.

On 06/08/10 10:57, Sarah Jelinek wrote:
Hi Karen,

This is a really good design document. I do have some comments/questions. I tried not to repeat others comments, but if I did, simply refer me to your response emails.

Section 3:
Data Object Cache is a singleton. All components access the DataObjectCache directly without going through the engine. The engine initialize the DataObjectCache. No other components
should initialize the DataObjectCache.
I don't believe this is a correct assumption. As a result of it not being a singleton, it is likely to change some of the interfaces you need to provide in the Engine.

I will update the document removing the assumption that DataObjectCache will be a singleton. In terms of providing interface in the engine to reference the data object cache, I don't think we need to have a special function. After applications get a reference to the engine singleton,
they can simply use engine.doc.   I will mention this in the document.


Even though the engine runs the checkpoints, it does not report their progress. Applications that wants to monitor progress should register one or more progress receiver with the logger. Checkpoints report their progress directly to the logger. The engine helps to normalize the progress information for the logger. The install logger publishes progress information received
from the checkpoints to all registered progress receivers.

So, in looking at your uml diagram it looks like the logger has to call to the engine for the normalize_progress() method. Why then do we not have consumers log using the engine and have the engine normalize it before sending it to the progress logger? It seems backwards to me that the logger has to call the engine to normalize this data, then send the normalized progress to the progress receiver.
We want to have the InstallLogger be responsible for all progress reporting. Applications register
progress receiver with the logger, so, I think it is more natural
to have checkpoints report their progress to the logger also. That way, we present the concept that the logger is the middle man that receives and sends the progress information.

We did this during the prototype, and it works well.

Section 6:
Other things we need to allow for with regard to interaction with the engine are: -Getting DOC object handle for consumers. this is based on the DOC not being a singleton.
I will add information in the document on how the DOC can be accessed via the engine.

-Getting a list of checkpoints registered.

Why do we need this? When we were reviewing the architecture document, I asked about having the engine provide such an interface. Your response is that since the application registered the checkpoints, they would already have the list of checkpoints registered.
It makes a lot of sense to me.  So, I didn't have such an interface.

-unregister checkpoints
I can not think of a use case of needing to unregister checkpoints. Is it really necessary? If we were to provide the interface, I think only un-executed checkpoints can be unregistered.
Checkpoint that have been executed can not be.


Section 6.2.1:
loglevel: I thought that we wanted to be able to set log level per checkpoint as well? Is this still the case? If it is, why do we have a log level for the Engine as well?
Yes, we want to set log level per checkpoint. To set a special log level for the checkpoint, the level is provided at registration time of that checkpoint. The log level specified in the __init__() function will be the log level used
for the entire application/engine/checkpoints...etc..

Debug: so, if set to false, when are the DOC snapshots removed from /tmp? My question is how do we know if we want them removed if we are planning on stopping and resuming?
DOC snapshots will be removed from /tmp when the application exits. Section 6.7.1 talks about the __del__() function, which will be called automatically for "destruction" of the engine object instance.

Stopping and resuming is only allowed out-of-process for those checkpoints that have DOC snapshots stored
in the ZFS install target.

Before ZFS install target is available, DOC snapshots are stored in /tmp. After ZFS install target is available, DOC
snapshots are stored in the ZFS install target.


6.3.2: ordering of registered checkpoints
Why can't an application insert a new checkpoint in front of an already executed checkpoint? It seems as if this happens, and we resume from this newly inserted checkpoint, then any subsequent checkpoint has to be re-run, regardless of its current state. Which means the engine must cleanup the previous run checkpoint's snapshots. I can see that it might be useful to allow an application, or user, to specify a new checkpoint that doesn't follow in order of the last executed checkpoint, but before with the intent of starting from the newly inserted checkpoint.

We could allow inserting a new checkpoint in front of an already executed checkpoint
he way you suggested above, but it is very confusing, in my opinion.

For example,

Application registered A, B, C, D, E, F, and executed A, B, C, D.
So, naturally, the application can resume from A, B, C, or D.
If they just want to continue where they left off last time, they will start from E.

If we allow register a checkpoint A1 in the middle of executed checkpoints, say,
in the middle of A and B.
Now, if I want to continue from previous execution, I would "continue" from A1,
re-executing B, C, D....  That's very confusing, I think.
Also, what about if people want to resume from checkpoint C.  That will be
invalid now.

I couldn't think of a use case where allowing insertion of a checkpoint in front of already executed checkpoints to be useful, in the same invocation of an application.

Also, just to be clear, you *CAN* insert in front of an already executed
checkpoints during subsequent invocation of the application.
When you run the app again, and you registered a set of checkpoints
with the "new" one inserted in the middle of checkpoints executed
in previous invocation of the app.  You will get exactly the same behavior
you described above.


Section 6.3.4:
I would think that the input args to register_checkpoint could be a dictionary or something like that,rather than individual args. Much like an nvlist allows for extensibility in the design, we could do something similar here, in case we ever want or need to change the parameters for registering a checkpoint. I am referring specifically to the checkpoint_name, module_path, checkpoint_obj, insert_before and log_level args.

Do you mean to use keyword arguments such as checkpoint_name=xxx, module_path=xxx..etc..? Or do you mean to specify everything in a dictionary, and then, pass the whole dictionary as one
single argument?

Either way, I think it is not as good as specifying it the way it is currently proposed.

The approach of specifying them as keyword arguments is not very Python style. My observation in python is that all required arguments are explicitly spelled out. Then, optional arguments are specified as keyword arguments, with the default
value specified.

I feel the approach of specifying them as a single dictionary to be very confusing. You would first build the dictionary separately, and then, pass it in. Additionally, passing it in as a dictionary would require the function to check that all the required arguments are specified. Furthermore, people will not be able to see the default
value of optional arguments.

I understand the desire for extensibility in the future. Since we have *args and **kwargs passed into the register function as well, it is up to the registration function to interpret however many args and kwargs. All "un-used" args and kwargs are simply passed
to the constructor of the checkpoint class.


Section 6.5.1: cancel_checkpoint():
It seems that with python's threading model, that is there is no kill method for threads, that the cancel function is going to perhaps set a flag in the checkpoint object, and the execute() method on that object will have to poll and wait on a condition to stop themselves. I would think we would have to spell out in your interface description what has to be done for both the execute() and cancel() methods so implementers know how to make use of the cancel functionality.
It's true that there's no kill method for threads. It is actually up to the checkpoint developer to decide how they want to implement the cancel(). They can set a flag when cancel is called and have execute() check that flag. Or they can choose to do nothing when cancel is called, if the checkpoint is very small. In which case, we just need to wait for execute() to finish.

I will add an example on how people can implement cancel().


Also, since it is up to the engine to decide which thread to cancel, how can the consumer know where it can resume, since it may not know what was currently executing?
Good point. Perhaps cancel_checkpoint() can return the name of the checkpoint that's canceled? If the application just want to continue executing all un-executed checkpoints. It can call execute_checkpoint()
with no start_from value.

A general comment on the format of this design doc.. I found it hard to figure out what objects the methods were associated with because of the way you laid out the document. Perhaps describing the install engine class at first, with all the named methods, and any other classes such as the checkpoint class and all its methods, and then describing them would make it easier to understand.
All methods are supposed to only associate with the InstallEngine class. I can move the Interface Table up, so, all the interfaces are shown first, and then, it gets discussed in detail later. Do you think that's better?
The way I have it now was meant to summarize all the interfaces
that's presented in the document.


Section 7.4: resume_execute_checkpoint, using a dictionary or something similar as input args like suggested for execute_checkpoint() would make this more extensible, imo.
Same reply as the argument list for register_checkpoint() above.


Section 8: Creation of the checkpoint data subtree.. in my proposed schema the checkpoints would be stored at the time, in the desired system state part of the DOC, at the time we read these from the manifest. how does this work in terms of your storing the checkpoints later if they were executed successfully only? Maybe not an issue, just trying to understand why we need this Engine private data.
I believe the checkpoints proposed in your schema proposal will be values that's used by the application. The application
will query those information from the manifest, and use those information
for the register_checkpoint() call. The engine does not get any checkpoint information from the DOC
that's not stored in there by itself.  The checkpoint data subtree is
meant to store bookkeeping data for the engine to support stop and resume.
Since we can only resume from previously successfully executed checkpoints, we only need to store
data about those.

Thanks for reviewing again.

--Karen



thanks,
sarah


On 05/28/10 04:12 PM, Karen Tung wrote:
Hi,

The draft install engine design doc is ready for review.
I pushed the doc to the caiman-docs repo.  You can
find them under the install_engine directory in
ssh://[email protected]/hg/caiman/caiman-docs

If you don't want to clone the gate, you can also access
the file directly here:

http://cvs.opensolaris.org/source/xref/caiman/caiman-docs/install_engine

You will find 2 files in the directory:
* engine-design-doc.odt is the design document
* engine-sequence-diag-0528.png is the sequence diagram inserted into chapter 14 of the document. You might want to see the bigger image directly.

Please send your feedback by 6/11/2010.

If you plan to review the document, please send me an email privately,
and preferably also let me know when you will complete the review, so
I can plan accordingly and ensure the document is reviewed thoroughly.

Thanks,

--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to