|
[Tibor Biro] One
idea is to monitor the executing thread and terminate it if it exceeds a
configurable amount of time. This value should be configurable from the
application so the user can set it but an override at the Executor level is probably
desirable as well. One problem here is that some machines take longer to
execute something than others so maybe if the time it waits is a factor of the
computer’s speed it might be useful. Another idea I’ve
been toying with is to require long running threads to raise status events
containing a “percent done” value and maybe some other custom
stuff. The monitoring thread would then have data to see if the thread is dead
or just taking longer to complete but still alive. The events could have enough
information to be used as a checkpoints but this would be up to the
implementation of each application. Both
approaches could be implemented in some mix. I wouldn’t mind exploring
other ideas as well.
[Tibor Biro] The Executor should persist the executed thread
in case the Manager is not available and send it back once a connection is
made. You should investigate the points of failure in this scenario and address
them as they are discovered. [Tibor Biro] Once you decide which area to work on let’s
start a thread on the SourceForge forums so we can iron out the details and
document what other ideas were considered.
Tibor |
- [Alchemi-developers] Fault tolerance in Alchemi andrew hudson
- RE: RE: [Alchemi-developers] Fault tolerance in Alchemi Tibor Biro
- [Alchemi-developers] Fault Tolerance in Alchemi andrew hudson
- RE: [Alchemi-developers] Fault Tolerance in Alchemi Tibor Biro
- Re: RE: [Alchemi-developers] Fault Tolerance in Alchemi andrew hudson
