Hi Jean,
Thank you again for reviewing the document. My responses are inline.
On 06/09/10 10:07, jean.mccormack wrote:
Part 2.....
7.4 Implementation
---------------------------------
Why do you change where the snapshots of the DOC are stored? If for
the first n checkpoints they are in /tmp
why even bother with putting the last x snapshots somewhere else?
Also, what if there are multiple install targets?
Where do the snapshots go then? Also, sometimes the snapshots are in
/tmp/doc_<checkpoint name>_<pid>. What
happens in the case of DC on a resume when the pid is now different.
How do you know where to get the info from?
* DOC snapshots are stored in /tmp before the install target is
available for storing them. When the install
target is available, I store the DOC snapshots in the ZFS datasets so
they are included in the ZFS snapshots of
the install target too.
* If there are multiple install targets, I was planning to just use the
first install target. Dave made a point
in his review comment that the engine should provide a function for the
application to explicitly
specify what ZFS dataset to use for storing the snapshots. I think
that's a great idea. If we
do it that way, we also won't need to worry about having multiple
install targets.
* The DOC snapshots stored in /tmp, before the ZFS dataset is available
is only used for in-process resume.
For the case of DC, where the pid is different on a subsequent
invocation of DC, users are not allowed
to resume at checkpoints before the ZFS dataset is available, because
like you said, we won't be able
to access the files from /tmp anymore.
You say this:
If ZFS dataset is not available, the engine will query the
DataObjectCache to see if the install
target is set, and if so, whether it is created.
That kind of leaves me hanging. If it is created what happens? And
likewise, what if it isn't there?
If it is created and available, it will be used. If it is not ready to
be used, we will not
take any ZFS snapshot. We will just take the DOC snapshot and put it in
/tmp.
7.4.1 Determining which checkpoint can be resumed to
----------------------------------------------------------------------------------------------
This is misleading:
The checkpoint must be registered at exactly the same position in the
checkpoint list as the
previous invocation of the application.
Then you go on to explain which is in my mind correct. But the first
bullet I mention above is
not the same as your explanation.
Does my explaination in the first bullet above help?
The DOC contains information on all the successfully executed
checkpoints. So, having
a snapshot of it will help determine what checkpoints was last executed,
and in what
order are they executed in.
page 16
7.5 Function Definition
---------------------------------------
when talking about resume_execute_checkpoint you say:
This function can only be called once in each invocation of the
application.
Why? I would think that if you had checkpoints a,b,c,d,e,f you could
resume from b and pause at c and then resume from c and pause at e if
you wanted.
resume_execute_checkpoint() is used for cases where one wants to resume
from a previous
invocation of the program. So, it does the check to see whether the
specified checkpoint_name
is resumable based on the "rules", does the rollback of the ZFS snapshot
and DOC snapshot...etc..
Let me clarify this in the function description.
To do the example you had above, in the same process,
you don't need to use resume_execute_checkpoint(). You can
just do it with multiple execute_checkpoints(), for example:
execute_checkpoints(start_from="a", pause_at="c")
execute_checkpoints(start_from="c")
page 18
10.1 Progress Estimates
-------------------------------------------
You say this:
"The checkpoint developer should run their checkpoint on this
standardized machine, and based on some metric, the amount of time it
takes to execute that checkpoint will be converted to a value that
will be returned as weight."
My issue with this is that it doesn't take into account things like
number of packages being installed or size of the area being cpio'd
etc. For some checkpoints your statement will work, for others it
won't. Specifically, I think it won't for target discovery or transfer.
Yes, all the things you mentioned above should be included in the
progress estimate calculation by the checkpoint.
Speed of the network, speed of the disk, processor speed...etc.. varies
greatly and would affect performance,
even if we were to install 1 package or discovery 1 target. Therefore,
I talked about
using a standardized machine to fix those unknown values, and let the
other "measurable"
variables like number of packages to install, and size of image to cpio
be calculated based on that.
Let me add more detail to this section to clarify.
Thanks again for your review.
--Karen
On 05/28/10 04:12 PM, Karen Tung wrote:
Hi,
The draft install engine design doc is ready for review.
I pushed the doc to the caiman-docs repo. You can
find them under the install_engine directory in
ssh://[email protected]/hg/caiman/caiman-docs
If you don't want to clone the gate, you can also access
the file directly here:
http://cvs.opensolaris.org/source/xref/caiman/caiman-docs/install_engine
You will find 2 files in the directory:
* engine-design-doc.odt is the design document
* engine-sequence-diag-0528.png is the sequence diagram inserted into
chapter 14 of the document. You might want to see the bigger image
directly.
Please send your feedback by 6/11/2010.
If you plan to review the document, please send me an email privately,
and preferably also let me know when you will complete the review, so
I can plan accordingly and ensure the document is reviewed thoroughly.
Thanks,
--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss