Hi Jean,

Thank you again for reviewing the document.  My responses are inline.

On 06/09/10 10:07, jean.mccormack wrote:
Part 2.....

7.4 Implementation
---------------------------------
Why do you change where the snapshots of the DOC are stored? If for the first n checkpoints they are in /tmp why even bother with putting the last x snapshots somewhere else? Also, what if there are multiple install targets? Where do the snapshots go then? Also, sometimes the snapshots are in /tmp/doc_<checkpoint name>_<pid>. What happens in the case of DC on a resume when the pid is now different. How do you know where to get the info from?

* DOC snapshots are stored in /tmp before the install target is available for storing them. When the install target is available, I store the DOC snapshots in the ZFS datasets so they are included in the ZFS snapshots of
the install target too.

* If there are multiple install targets, I was planning to just use the first install target. Dave made a point in his review comment that the engine should provide a function for the application to explicitly specify what ZFS dataset to use for storing the snapshots. I think that's a great idea. If we do it that way, we also won't need to worry about having multiple install targets.

* The DOC snapshots stored in /tmp, before the ZFS dataset is available is only used for in-process resume. For the case of DC, where the pid is different on a subsequent invocation of DC, users are not allowed to resume at checkpoints before the ZFS dataset is available, because like you said, we won't be able
to access the files from /tmp anymore.

You say this:
If ZFS dataset is not available, the engine will query the DataObjectCache to see if the install
  target is set, and if so, whether it is created.

That kind of leaves me hanging. If it is created what happens? And likewise, what if it isn't there?
If it is created and available, it will be used. If it is not ready to be used, we will not take any ZFS snapshot. We will just take the DOC snapshot and put it in /tmp.


 7.4.1 Determining which checkpoint can be resumed to
----------------------------------------------------------------------------------------------
This is misleading:
The checkpoint must be registered at exactly the same position in the checkpoint list as the
previous invocation of the application.

Then you go on to explain which is in my mind correct. But the first bullet I mention above is
not the same as your explanation.
Does my explaination in the first bullet above help?
The DOC contains information on all the successfully executed checkpoints. So, having a snapshot of it will help determine what checkpoints was last executed, and in what
order are they executed in.

page 16
7.5 Function Definition
---------------------------------------
when talking about resume_execute_checkpoint you say:
This function can only be called once in each invocation of the application.
Why? I would think that if you had checkpoints a,b,c,d,e,f you could
resume from b and pause at c and then resume from c and pause at e if you wanted.
resume_execute_checkpoint() is used for cases where one wants to resume from a previous invocation of the program. So, it does the check to see whether the specified checkpoint_name is resumable based on the "rules", does the rollback of the ZFS snapshot and DOC snapshot...etc..
Let me clarify this in the function description.

To do the example you had above, in the same process,
you don't need to use resume_execute_checkpoint().  You can
just do it with multiple execute_checkpoints(), for example:

execute_checkpoints(start_from="a", pause_at="c")
execute_checkpoints(start_from="c")



page 18

10.1 Progress Estimates
-------------------------------------------
You say this:
"The checkpoint developer should run their checkpoint on this standardized machine, and based on some metric, the amount of time it takes to execute that checkpoint will be converted to a value that will be returned as weight."

My issue with this is that it doesn't take into account things like number of packages being installed or size of the area being cpio'd etc. For some checkpoints your statement will work, for others it won't. Specifically, I think it won't for target discovery or transfer.


Yes, all the things you mentioned above should be included in the progress estimate calculation by the checkpoint.

Speed of the network, speed of the disk, processor speed...etc.. varies greatly and would affect performance, even if we were to install 1 package or discovery 1 target. Therefore, I talked about using a standardized machine to fix those unknown values, and let the other "measurable" variables like number of packages to install, and size of image to cpio be calculated based on that.

Let me add more detail to this section to clarify.

Thanks again for your review.

--Karen




On 05/28/10 04:12 PM, Karen Tung wrote:
Hi,

The draft install engine design doc is ready for review.
I pushed the doc to the caiman-docs repo.  You can
find them under the install_engine directory in
ssh://[email protected]/hg/caiman/caiman-docs

If you don't want to clone the gate, you can also access
the file directly here:

http://cvs.opensolaris.org/source/xref/caiman/caiman-docs/install_engine

You will find 2 files in the directory:
* engine-design-doc.odt is the design document
* engine-sequence-diag-0528.png is the sequence diagram inserted into chapter 14 of the document. You might want to see the bigger image directly.

Please send your feedback by 6/11/2010.

If you plan to review the document, please send me an email privately,
and preferably also let me know when you will complete the review, so
I can plan accordingly and ensure the document is reviewed thoroughly.

Thanks,

--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to