Here's a summary of discussions we had this morning
on the Install Execution Engine design.
1) Application's usage of InstallEngine.execute_checkpoints() and threading:
- If an application calls execute_checkpoints() with
a callback function, execute_checkpoints() will return after
all checkpoints are instantiated. When the thread executing
checkpoints is completed, the callback function provided
by the application will be called.
- If an application calls execute_checkpoints() function without
providing a callback function, execute_checkpoints() will not
return until all checkpoints are executed.
2) Canceling a checkpoint
- It's the application's responsibility to setup a signal handler to process
signals such as control-c.
- When the engine receives a cancel_checkpoints() request, it will
call the cancel() function of the checkpoint that's executing.
- The default implementation for the AbstractCheckpoint.cancel() function
will be to set a threading.Event variable.
Checkpoints that do not overwrite the default cancel() implementation should
check the value of this variable using the is_set() function, and
perform the necessary cleanup and exit.
- Checkpoints that do not want to use the default cancel()
implementation can overwrite with it's
own implementation when they subclass the AbstractCheckpoint object.
3) stop-on-error
- If stop-on-error is false, the engine will continue executing all
checkpoints despite exceptions
from one or more of the checkpoints.
- DOC and/or ZFS snapshots will be taken after each of the checkpoints
are executed, despite the
exception(s). If the application wants to resume at a previously failed
checkpoint and the stop-on-error
flag is false, the application is allowed to resume at that checkpoint
if other resume requirements are met.
4) AbstractCheckpoint.get_progress_estimate()
- This function will return the number of seconds it takes to execute
the checkpoint
in seconds as measured by the wall-clock, on a standardized machine.
- Developers who might not have access to the standardized machine or if the
standardized machine becomes obsolete in the future, can run one of the
existing
checkpoints that perform similar operation to their checkpoint on any
available machine and
use that as a guidance to figure out the approximate number of seconds
it takes
to run the newly developed checkpoint.
5) Keith's question about using Error Service module (errsvc) for storing
exceptions raised by the checkpoints, instead of storing the exceptions
as a list.
- the Error Service module is suitable and can be used with some
modifications.
- The ErrorInfo object can be used to store the exception.
The mod_id in the ErrorInfo object can be used for storing the name of
the checkpoint
that raised the exception.
- As currently implemented, ErrorInfo object only accepts
"integer" and "string" as the error data type. It needs to be modified to
accept an "object", which will be used for storing the exception raised
by the checkpoint.
6) After a checkpoint completes successfully, the engine will always send
a progress update to the logger on the overall percentage complete. This
allows accurate progress to be reported even if a checkpoint does not report
intermediate progress.
Please let me know if you have any questions or comments on this summary.
Thanks,
--Karen
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss