[
https://issues.apache.org/jira/browse/IGNITE-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vyacheslav Koptilin updated IGNITE-20451:
-----------------------------------------
Summary: Introduce WorkerRegistery (was: Introduce Introduce
WorkerRegistery)
> Introduce WorkerRegistery
> -------------------------
>
> Key: IGNITE-20451
> URL: https://issues.apache.org/jira/browse/IGNITE-20451
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vyacheslav Koptilin
> Priority: Major
> Labels: ignite-3
>
> Each Ignite node has a number of system-critical threads. We should implement
> a periodic check that calls the failure handler when one of the following
> conditions has been detected:
> - Critical thread is not alive anymore.
> - Critical thread 'hangs' for a long time, e.g. while executing a task
> extracted from the task queue.
> In case of failure condition, call stacks of all threads should be logged
> before invoking failure handler.
> Implementations based on separate diagnostic thread seem fragile, cause this
> thread become a vulnerable point with respect to thread termination and CPU
> resource starvation. So we are to use self-monitoring approach: critical
> threads themselves should monitor each other.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)