Hi all, I am looking for ideas to improve the Alchemi resilience.
As a first step I am trying to identify areas where improvements are necessary. Since this is just brainstorming no idea is too wild or silly so if you have any let's hear them. Scheduler features that improve resilience: - schedule a thread on multiple executors, take the response from the first one. This improves the chances of a thread being executed. - schedule a thread on multiple executors and compare the results before returning to the application. This improves the quality of the computation done by executors and weeds out executors that corrupt data. - wait a given amount of time for a thread to be executed and if no response is received then re-schedule the thread on another executor. This is an optimistic implementation of the first variation. Executor features that improve resilience: - if the executor is shut down nicely release the running thread back to the manager. - if the executor is killed then release the running thread back to the manager on startup. - if connection to the manager is lost due to network issues, the manager being down or whatever then re-connect and continue working on the existing thread. Manager features that improve resilience: - detect dead executors and re-schedule their threads. - detect dead applications and stop running their threads. Tibor ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Alchemi-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/alchemi-developers
