Vyacheslav Koptilin created IGNITE-20451:
--------------------------------------------

             Summary: Introduce Introduce WorkerRegistery
                 Key: IGNITE-20451
                 URL: https://issues.apache.org/jira/browse/IGNITE-20451
             Project: Ignite
          Issue Type: Improvement
            Reporter: Vyacheslav Koptilin


Each Ignite node has a number of system-critical threads. We should implement a 
periodic check that calls the failure handler when one of the following 
conditions has been detected:
 - Critical thread is not alive anymore.
 - Critical thread 'hangs' for a long time, e.g. while executing a task 
extracted from the task queue.

In case of failure condition, call stacks of all threads should be logged 
before invoking failure handler.

Implementations based on separate diagnostic thread seem fragile, cause this 
thread become a vulnerable point with respect to thread termination and CPU 
resource starvation. So we are to use self-monitoring approach: critical 
threads themselves should monitor each other.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to