Igniters, Internal problems may and, unfortunately, cause unexpected cluster behavior. We should determine behavior in case any of internal problem happened.
Well known internal problems can be split to: 1) OOM or any other reason cause node crash 2) Situations required graceful node shutdown with custom notification - IgniteOutOfMemoryException - Persistence errors - ExchangeWorker exits with error 3) Prefomance issues should be covered by metrics - GC STW duration - Timed out tasks and jobs - TX deadlock - Hanged Tx (waits for some service) - Java Deadlocks I created special issue [1] to make sure all these metrics will be presented at WebConsole or VisorConsole (what's preferred?) 4) Situations required external monitoring implementation - GC STW duration exceed maximum possible length (node should be stopped before STW finished) All this problems were reported by different persons different time ago, So, we should reanalyze each of them and, possible, find better ways to solve them than it described at issues. P.s. IEP-7 [2] already contains 9 issues, feel free to mention something else :) [1] https://issues.apache.org/jira/browse/IGNITE-6961 [2] https://cwiki.apache.org/confluence/display/IGNITE/IEP-7%3A+Ignite+internal+problems+detection