[ 
https://issues.apache.org/jira/browse/IGNITE-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101317#comment-16101317
 ] 

Semen Boikov commented on IGNITE-5811:
--------------------------------------

I think these issue can cause cluster hang as well:
- OutOfMemory error (java and probably our IgniteOutOfMemoryException?)
- errors with persistent storage
- transactions deadlock

Also I don't think we really need 'policy', node should be stopped anyway, what 
we can provide is user callback, something like 'beforeNodeStop'.

Thanks

> Detect internal Ignite problems (java-level deadlock, hangs, etc) and act 
> according to a policy configured.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-5811
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5811
>             Project: Ignite
>          Issue Type: New Feature
>            Reporter: Yakov Zhdanov
>              Labels: usability
>
> This has something in common with segmentation policy we currently have. User 
> should get notified on a deadlock problem and node should take an action 
> (stop by default).
> Also Ignite may react on internal errors and hangs in the same way - fire 
> event and take the appropriate action.
> Current list of cases when node should (by default) stop itself:
> # Discovery reports segmentation (already implemented)
> # Critical discovery thread fails (already implemented)
> # NIO communication thread fails (already implemented)
> The following needs to be added
> # Java-deadlock detected
> # Internal threads stuck (no progress on current tasks during defined period)
> # ExchangeWorker exits with error
> We need to reapproach handling for all situations above to use the same 
> mechanism and make node take the action according to a configured policy



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to