Hi Sarah,
Thank you so much for reviewing the document. My responses are inline.
On 06/08/10 10:57, Sarah Jelinek wrote:
Hi Karen,
This is a really good design document. I do have some
comments/questions. I tried not to repeat others comments, but if I
did, simply refer me to your response emails.
Section 3:
Data Object Cache is a singleton. All components access the
DataObjectCache directly without
going through the engine. The engine initialize the DataObjectCache.
No other components
should initialize the DataObjectCache.
I don't believe this is a correct assumption. As a result of it not
being a singleton, it is likely to change some of the interfaces you
need to provide in the Engine.
I will update the document removing the assumption that DataObjectCache
will be a singleton.
In terms of providing interface in the engine to reference the data
object cache, I don't think
we need to have a special function. After applications get a reference
to the engine singleton,
they can simply use engine.doc. I will mention this in the document.
Even though the engine runs the checkpoints, it does not report their
progress. Applications
that wants to monitor progress should register one or more progress
receiver with the logger.
Checkpoints report their progress directly to the logger. The engine
helps to normalize the
progress information for the logger. The install logger publishes
progress information received
from the checkpoints to all registered progress receivers.
So, in looking at your uml diagram it looks like the logger has to
call to the engine for the normalize_progress() method. Why then do we
not have consumers log using the engine and have the engine normalize
it before sending it to the progress logger? It seems backwards to me
that the logger has to call the engine to normalize this data, then
send the normalized progress to the progress receiver.
We want to have the InstallLogger be responsible for all progress
reporting. Applications register
progress receiver with the logger, so, I think it is more natural
to have checkpoints report their progress to the logger also. That way,
we present the concept
that the logger is the middle man that receives and sends the progress
information.
We did this during the prototype, and it works well.
Section 6:
Other things we need to allow for with regard to interaction with the
engine are:
-Getting DOC object handle for consumers. this is based on the DOC not
being a singleton.
I will add information in the document on how the DOC can be accessed
via the engine.
-Getting a list of checkpoints registered.
Why do we need this? When we were reviewing the architecture document,
I asked about
having the engine provide such an interface. Your response is that
since the application
registered the checkpoints, they would already have the list of
checkpoints registered.
It makes a lot of sense to me. So, I didn't have such an interface.
-unregister checkpoints
I can not think of a use case of needing to unregister checkpoints. Is
it really necessary?
If we were to provide the interface, I think only un-executed
checkpoints can be unregistered.
Checkpoint that have been executed can not be.
Section 6.2.1:
loglevel: I thought that we wanted to be able to set log level per
checkpoint as well? Is this still the case? If it is, why do we have a
log level for the Engine as well?
Yes, we want to set log level per checkpoint. To set a special log
level for the checkpoint, the level is provided
at registration time of that checkpoint. The log level specified in the
__init__() function will be the log level used
for the entire application/engine/checkpoints...etc..
Debug: so, if set to false, when are the DOC snapshots removed from
/tmp? My question is how do we know if we want them removed if we are
planning on stopping and resuming?
DOC snapshots will be removed from /tmp when the application exits.
Section 6.7.1 talks about the __del__() function, which
will be called automatically for "destruction" of the engine object
instance.
Stopping and resuming is only allowed out-of-process for those
checkpoints that have DOC snapshots stored
in the ZFS install target.
Before ZFS install target is available, DOC snapshots are stored in
/tmp. After ZFS install target is available, DOC
snapshots are stored in the ZFS install target.
6.3.2: ordering of registered checkpoints
Why can't an application insert a new checkpoint in front of an
already executed checkpoint? It seems as if this happens, and we
resume from this newly inserted checkpoint, then any subsequent
checkpoint has to be re-run, regardless of its current state. Which
means the engine must cleanup the previous run checkpoint's snapshots.
I can see that it might be useful to allow an application, or user, to
specify a new checkpoint that doesn't follow in order of the last
executed checkpoint, but before with the intent of starting from the
newly inserted checkpoint.
We could allow inserting a new checkpoint in front of an already
executed checkpoint
he way you suggested above, but it is very confusing, in my opinion.
For example,
Application registered A, B, C, D, E, F, and executed A, B, C, D.
So, naturally, the application can resume from A, B, C, or D.
If they just want to continue where they left off last time, they will
start from E.
If we allow register a checkpoint A1 in the middle of executed
checkpoints, say,
in the middle of A and B.
Now, if I want to continue from previous execution, I would "continue"
from A1,
re-executing B, C, D.... That's very confusing, I think.
Also, what about if people want to resume from checkpoint C. That will be
invalid now.
I couldn't think of a use case where allowing insertion of a checkpoint
in front of
already executed checkpoints to be useful, in the same invocation of an
application.
Also, just to be clear, you *CAN* insert in front of an already executed
checkpoints during subsequent invocation of the application.
When you run the app again, and you registered a set of checkpoints
with the "new" one inserted in the middle of checkpoints executed
in previous invocation of the app. You will get exactly the same behavior
you described above.
Section 6.3.4:
I would think that the input args to register_checkpoint could be a
dictionary or something like that,rather than individual args. Much
like an nvlist allows for extensibility in the design, we could do
something similar here, in case we ever want or need to change the
parameters for registering a checkpoint. I am referring specifically
to the checkpoint_name, module_path, checkpoint_obj, insert_before and
log_level args.
Do you mean to use keyword arguments such as checkpoint_name=xxx,
module_path=xxx..etc..?
Or do you mean to specify everything in a dictionary, and then, pass the
whole dictionary as one
single argument?
Either way, I think it is not as good as specifying it the way it is
currently proposed.
The approach of specifying them as keyword arguments is not very Python
style.
My observation in python is that all required arguments are explicitly
spelled out.
Then, optional arguments are specified as keyword arguments, with the
default
value specified.
I feel the approach of specifying them as a single dictionary to be very
confusing.
You would first build the dictionary separately, and then, pass it in.
Additionally, passing
it in as a dictionary would require the function to check that all the
required
arguments are specified. Furthermore, people will not be able to see
the default
value of optional arguments.
I understand the desire for extensibility in the future. Since we have
*args and **kwargs passed
into the register function as well, it is up to the registration
function to interpret
however many args and kwargs. All "un-used" args and kwargs are simply
passed
to the constructor of the checkpoint class.
Section 6.5.1: cancel_checkpoint():
It seems that with python's threading model, that is there is no kill
method for threads, that the cancel function is going to perhaps set a
flag in the checkpoint object, and the execute() method on that object
will have to poll and wait on a condition to stop themselves. I would
think we would have to spell out in your interface description what
has to be done for both the execute() and cancel() methods so
implementers know how to make use of the cancel functionality.
It's true that there's no kill method for threads. It is actually up to
the checkpoint developer to decide how they want to
implement the cancel(). They can set a flag when cancel is called and
have execute() check that flag. Or they can choose
to do nothing when cancel is called, if the checkpoint is very small.
In which case, we just need to wait for execute() to finish.
I will add an example on how people can implement cancel().
Also, since it is up to the engine to decide which thread to cancel,
how can the consumer know where it can resume, since it may not know
what was currently executing?
Good point. Perhaps cancel_checkpoint() can return the name of the
checkpoint that's canceled?
If the application just want to continue executing all un-executed
checkpoints. It can call execute_checkpoint()
with no start_from value.
A general comment on the format of this design doc.. I found it hard
to figure out what objects the methods were associated with because of
the way you laid out the document. Perhaps describing the install
engine class at first, with all the named methods, and any other
classes such as the checkpoint class and all its methods, and then
describing them would make it easier to understand.
All methods are supposed to only associate with the InstallEngine
class. I can move the Interface Table up, so, all the interfaces
are shown first, and then, it gets discussed in detail later. Do you
think that's better?
The way I have it now was meant to summarize all the interfaces
that's presented in the document.
Section 7.4: resume_execute_checkpoint, using a dictionary or
something similar as input args like suggested for
execute_checkpoint() would make this more extensible, imo.
Same reply as the argument list for register_checkpoint() above.
Section 8: Creation of the checkpoint data subtree.. in my proposed
schema the checkpoints would be stored at the time, in the desired
system state part of the DOC, at the time we read these from the
manifest. how does this work in terms of your storing the checkpoints
later if they were executed successfully only? Maybe not an issue,
just trying to understand why we need this Engine private data.
I believe the checkpoints proposed in your schema proposal will be
values that's used by the application. The application
will query those information from the manifest, and use those information
for the register_checkpoint() call. The engine does not get any
checkpoint information from the DOC
that's not stored in there by itself. The checkpoint data subtree is
meant to store bookkeeping data for the engine to support stop and resume.
Since we can only resume from previously successfully executed
checkpoints, we only need to store
data about those.
Thanks for reviewing again.
--Karen
thanks,
sarah
On 05/28/10 04:12 PM, Karen Tung wrote:
Hi,
The draft install engine design doc is ready for review.
I pushed the doc to the caiman-docs repo. You can
find them under the install_engine directory in
ssh://[email protected]/hg/caiman/caiman-docs
If you don't want to clone the gate, you can also access
the file directly here:
http://cvs.opensolaris.org/source/xref/caiman/caiman-docs/install_engine
You will find 2 files in the directory:
* engine-design-doc.odt is the design document
* engine-sequence-diag-0528.png is the sequence diagram inserted into
chapter 14 of the document. You might want to see the bigger image
directly.
Please send your feedback by 6/11/2010.
If you plan to review the document, please send me an email privately,
and preferably also let me know when you will complete the review, so
I can plan accordingly and ensure the document is reviewed thoroughly.
Thanks,
--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss