Re: [caiman-discuss] Engine's use of errsvc

Keith Mitchell Mon, 19 Jul 2010 17:34:31 -0700

It seems like it's in the spirit of the errsvc to have the code thatdeals with the exception be the code that removes the exception from theerrsvc. The engine is not dealing with the exceptions; it's simplyrecording them.

Regardless, it's a bug on the app's side if it's not checking the errsvcfor exceptions after a call to blocking execute, or from within thecallback in the case of a non-blocking execute. And it's not a bug thatthe engine can do anything about sanely, so I'm fine with clearing ifthat's truly better.


- Keith

On 07/19/10 03:36 PM, Karen Tung wrote:

All the discussion on this so far seem to indicate that everybody agrees
the application should deal with the errors as they are reported.
So, it would be OK for the execute_checkpoint call in
the engine to just clear all the errors before it starts
executing anything.

If I don't hear any objections.  I will clarify this in the engine
design spec, and implement it in the code.

Thanks,

--Karen

On 07/19/10 14:49, Sarah Jelinek wrote:
Hi Karen,



On 07/19/10 12:34 PM, Karen Tung wrote:
Hi Sarah,

Comments inline.

On 07/19/10 10:26, Sarah Jelinek wrote:
Hi Karen,

On 07/17/10 01:30 AM, Karen Tung wrote:
During the implementation of the engine, we found a problem
with using errsvc.

Background:
------------------------
-  Only 1 instance of errsvc exists in the name space
of the application and all modules and libraries it uses.
- Errors (instances of ErrorInfo) are stored in one single listcalled
_ERRORS
in errsvc.py.
- Engine is going to use errsvc to store exception(s) raised
by checkpoints' execute() method.  Engine will use checkpoint name
as the mod_id for the ErrorInfo objects so application can
easily identify which checkpoint failed.

Problem:
-----------------
- If the application choose to execute the same checkpoint multiple
times, and all the execution failed, multiple ErrorInfo with the
same mod_id will be added to the list of errors.  Application
can not easily figure out which ErrorInfo belongs to which
invocation of that same checkpoint.
How likely is this scenario to actually happen? I assume that whatyou are saying is that only 1 checkpoint of this type is registeredbut the app runs it multiple times? Is it likely that the same codepaths would be run in the multiple invocations? Even if the mod_idis the same, the error 'stack' would be different, wouldn't it,with multiple runs of the same checkpoint? Wouldn't it be likelythat something woudl be different, such as input with the multipleinvocations? I think there has to be some differentiating data evenif you run the same checkpoint multiple times. Maybe I am notseeing a scenario where this could happen without anything different?
The engine allows the same set of checkpoints to re-run. So,depending on how the applicationchoose to run the checkpoints, the same checkpoint(s) failing atdifferent execute_checkpoint() callscan occur. We are using the checkpoint name, which is supposed tobe unique across thecheckpoint, as the mod_id value for the errsvc. As you mentioned,even if the same checkpointfailed multiple times, the stack trace would be different. However,the problem is thatthe application will not be able to distinguish which invocation ofthe checkpoint the
error comes from.

For example:
- Application calls execute_checkpoints() to run checkpoints A, B,C, D. Checkpoint D raises an exception,which is saved in errsvc. At this point, errsvc has 1 ErrorInfo onit's list with the mod_id "D", and the associated
exception.
- Assuming the application "looked at" the exception but didn'tclear it from the errsvc.
- Application calls execute_checkpoints() to run checkpoints B, C,D. Checkpoint D raise an exception again.At this point, errsvc has 2 ErrorInfo on it's list, both with mod_id"D". The exceptionstored in the 2 ErrorInfo might be different. When the enginereturns controlto the application, and informs the application that checkpoint Dfailed. The applicationwill search the errsvc for mod_id of D, and it will find 2 failuresthis time. How does it know
which one is associated with the latest invocation?
Well.. it seems broken to me that an app just looks at the exceptionbut didn't do anything with it. Even if it is ignoring errors itshould clear the error from the ErrorInfo array. I assumed you meantrunning a checkpoint twice within the same invocation of the app.
Possible Solutions:
-------------------------------
1) engine.execute_checkpoint() will always call
errsvc.clear_error_list() before it executes any checkpoint.
This way, when execution completes, the errsvc
will only contain errors raised during that execution.
The problem with this approach is that
error messages in errsvc that's not stored by the engine prior
to the execution, and not dealt with by the app, will get lost.
I personally don't see any issue with this - if you've got thisfar and no-onehas handled the error, then it's probably not worth remembering...BUT, therewould be no harm in logging anything in the error list before youclear it, just
for completeness.
Actually, I do have a concern about this. If someone is running atest run, and wants to ignore errors along the way but capture themin the error service for later dumping by the app, this means wewill lose this data. {resumably they were running these checkpointsmultiple times for a reason. Logging it is ok, but it makes itharder for the user to see the errors.
3) Move the responsibility to the application to manage the
> error list. They should either clear it or remove theErrorInfo objects
>  they are not interested in.  The engine.execute_checkpoint()
>  will simplely append to the error list, and return the list of
>  checkpoints names that failed.
It's everyone's responsibility to look for errors - so beforeEngine.execute()is called, the Application really should have looked for errorsthat might have
occurred so far anyway.
I thought the whole idea was to set the stop_on_error bit so thatthe execution engine could determine what to do. And, the app thentakes the error info objects and does what it needs to do. Howwould the app check the errors before the Engine.execute call inbetween checkpoints? It couldn't, and I would think it would behard for the app to know which ErrorInfo objects it is interested in.
For the case where stop_on_error flag false, the engine will executeall the checkpoints despite error.Error from all checkpoints will be stored in the errsvc. When thewhole list of checkpoints are executed,the errsvc will contain all the exception raised. The applicationwill then be able to check for the errors. The application
can not check for errors during checkpoint execution.
Right, this makes sense.
Seems to me if the app is really running the same checkpointmultiple times that we need a way to differentiate theseinvocations. Even beyond error handling don't we have the sameissues with logging? From your design spec for the engine we arerelying on the checkpoint name, which in this case would be thesame for this scenario.
With logging, there could be additional context, such as otherinformation that's logged, even if there'sno failure. Most importantly, we always guaranteed that the latestinformation are appended to the log.That means, reading the log, you always know the latest failure isat the bottom of the log.
However, with errsvc, the application should not be aware of howerrors are stored internally. errsvchas public APIs for accessing the errors, but those API does notguarantee the order of errors returned.
Well.. if the scenario you describe above is what we are concernedabout then I would think the application must do something with theerrsvc errors, even if it is just browsing them. I would think itwould be up to the application.
sarah
****
Thanks,

--Karen
sarah


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss


_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Re: [caiman-discuss] Engine's use of errsvc

Reply via email to