[
https://issues.apache.org/jira/browse/IGNITE-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101317#comment-16101317
]
Semen Boikov commented on IGNITE-5811:
--------------------------------------
I think these issue can cause cluster hang as well:
- OutOfMemory error (java and probably our IgniteOutOfMemoryException?)
- errors with persistent storage
- transactions deadlock
Also I don't think we really need 'policy', node should be stopped anyway, what
we can provide is user callback, something like 'beforeNodeStop'.
Thanks
> Detect internal Ignite problems (java-level deadlock, hangs, etc) and act
> according to a policy configured.
> -----------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-5811
> URL: https://issues.apache.org/jira/browse/IGNITE-5811
> Project: Ignite
> Issue Type: New Feature
> Reporter: Yakov Zhdanov
> Labels: usability
>
> This has something in common with segmentation policy we currently have. User
> should get notified on a deadlock problem and node should take an action
> (stop by default).
> Also Ignite may react on internal errors and hangs in the same way - fire
> event and take the appropriate action.
> Current list of cases when node should (by default) stop itself:
> # Discovery reports segmentation (already implemented)
> # Critical discovery thread fails (already implemented)
> # NIO communication thread fails (already implemented)
> The following needs to be added
> # Java-deadlock detected
> # Internal threads stuck (no progress on current tasks during defined period)
> # ExchangeWorker exits with error
> We need to reapproach handling for all situations above to use the same
> mechanism and make node take the action according to a configured policy
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)