Hi Sarah,
Comments inline.
On 07/19/10 10:26, Sarah Jelinek wrote:
Hi Karen,
On 07/17/10 01:30 AM, Karen Tung wrote:
During the implementation of the engine, we found a problem
with using errsvc.
Background:
------------------------
- Only 1 instance of errsvc exists in the name space
of the application and all modules and libraries it uses.
- Errors (instances of ErrorInfo) are stored in one single list called
_ERRORS
in errsvc.py.
- Engine is going to use errsvc to store exception(s) raised
by checkpoints' execute() method. Engine will use checkpoint name
as the mod_id for the ErrorInfo objects so application can
easily identify which checkpoint failed.
Problem:
-----------------
- If the application choose to execute the same checkpoint multiple
times, and all the execution failed, multiple ErrorInfo with the
same mod_id will be added to the list of errors. Application
can not easily figure out which ErrorInfo belongs to which
invocation of that same checkpoint.
How likely is this scenario to actually happen? I assume that what you
are saying is that only 1 checkpoint of this type is registered but
the app runs it multiple times? Is it likely that the same code paths
would be run in the multiple invocations? Even if the mod_id is the
same, the error 'stack' would be different, wouldn't it, with multiple
runs of the same checkpoint? Wouldn't it be likely that something
woudl be different, such as input with the multiple invocations? I
think there has to be some differentiating data even if you run the
same checkpoint multiple times. Maybe I am not seeing a scenario where
this could happen without anything different?
The engine allows the same set of checkpoints to re-run. So, depending
on how the application
choose to run the checkpoints, the same checkpoint(s) failing at
different execute_checkpoint() calls
can occur. We are using the checkpoint name, which is supposed to be
unique across the
checkpoint, as the mod_id value for the errsvc. As you mentioned, even
if the same checkpoint
failed multiple times, the stack trace would be different. However, the
problem is that
the application will not be able to distinguish which invocation of the
checkpoint the
error comes from.
For example:
- Application calls execute_checkpoints() to run checkpoints A, B, C,
D. Checkpoint D raises an exception,
which is saved in errsvc. At this point, errsvc has 1 ErrorInfo on it's
list with the mod_id "D", and the associated
exception.
- Assuming the application "looked at" the exception but didn't clear it
from the errsvc.
- Application calls execute_checkpoints() to run checkpoints B, C, D.
Checkpoint D raise an exception again.
At this point, errsvc has 2 ErrorInfo on it's list, both with mod_id
"D". The exception
stored in the 2 ErrorInfo might be different. When the engine returns
control
to the application, and informs the application that checkpoint D
failed. The application
will search the errsvc for mod_id of D, and it will find 2 failures this
time. How does it know
which one is associated with the latest invocation?
Possible Solutions:
-------------------------------
1) engine.execute_checkpoint() will always call
errsvc.clear_error_list() before it executes any checkpoint.
This way, when execution completes, the errsvc
will only contain errors raised during that execution.
The problem with this approach is that
error messages in errsvc that's not stored by the engine prior
to the execution, and not dealt with by the app, will get lost.
I personally don't see any issue with this - if you've got this far
and no-one
has handled the error, then it's probably not worth remembering...
BUT, there
would be no harm in logging anything in the error list before you
clear it, just
for completeness.
Actually, I do have a concern about this. If someone is running a test
run, and wants to ignore errors along the way but capture them in the
error service for later dumping by the app, this means we will lose
this data. {resumably they were running these checkpoints multiple
times for a reason. Logging it is ok, but it makes it harder for the
user to see the errors.
3) Move the responsibility to the application to manage the
> error list. They should either clear it or remove the ErrorInfo
objects
> they are not interested in. The engine.execute_checkpoint()
> will simplely append to the error list, and return the list of
> checkpoints names that failed.
It's everyone's responsibility to look for errors - so before
Engine.execute()
is called, the Application really should have looked for errors that
might have
occurred so far anyway.
I thought the whole idea was to set the stop_on_error bit so that the
execution engine could determine what to do. And, the app then takes
the error info objects and does what it needs to do. How would the app
check the errors before the Engine.execute call in between
checkpoints? It couldn't, and I would think it would be hard for the
app to know which ErrorInfo objects it is interested in.
For the case where stop_on_error flag false, the engine will execute all
the checkpoints despite error.
Error from all checkpoints will be stored in the errsvc. When the whole
list of checkpoints are executed,
the errsvc will contain all the exception raised. The application will
then be able to check for the errors. The application
can not check for errors during checkpoint execution.
Seems to me if the app is really running the same checkpoint multiple
times that we need a way to differentiate these invocations. Even
beyond error handling don't we have the same issues with logging? From
your design spec for the engine we are relying on the checkpoint name,
which in this case would be the same for this scenario.
With logging, there could be additional context, such as other
information that's logged, even if there's
no failure. Most importantly, we always guaranteed that the latest
information are appended to the log.
That means, reading the log, you always know the latest failure is at
the bottom of the log.
However, with errsvc, the application should not be aware of how errors
are stored internally. errsvc
has public APIs for accessing the errors, but those API does not
guarantee the order of errors returned.
Thanks,
--Karen
sarah
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss