Vyacheslav Koptilin created IGNITE-20451:
--------------------------------------------
Summary: Introduce Introduce WorkerRegistery
Key: IGNITE-20451
URL: https://issues.apache.org/jira/browse/IGNITE-20451
Project: Ignite
Issue Type: Improvement
Reporter: Vyacheslav Koptilin
Each Ignite node has a number of system-critical threads. We should implement a
periodic check that calls the failure handler when one of the following
conditions has been detected:
- Critical thread is not alive anymore.
- Critical thread 'hangs' for a long time, e.g. while executing a task
extracted from the task queue.
In case of failure condition, call stacks of all threads should be logged
before invoking failure handler.
Implementations based on separate diagnostic thread seem fragile, cause this
thread become a vulnerable point with respect to thread termination and CPU
resource starvation. So we are to use self-monitoring approach: critical
threads themselves should monitor each other.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)