Ignite Enhancement Proposal #7 (Internal problems detection)

Anton Vinogradov Mon, 20 Nov 2017 07:53:41 -0800

Igniters,

Internal problems may and, unfortunately, cause unexpected cluster
behavior.
We should determine behavior in case any of internal problem happened.


Well known internal problems can be split to:
1) OOM or any other reason cause node crash

2) Situations required graceful node shutdown with custom notification
- IgniteOutOfMemoryException
- Persistence errors
- ExchangeWorker exits with error

3) Prefomance issues should be covered by metrics
- GC STW duration
- Timed out tasks and jobs
- TX deadlock
- Hanged Tx (waits for some service)
- Java Deadlocks

I created special issue [1] to make sure all these metrics will be
presented at WebConsole or VisorConsole (what's preferred?)

4) Situations required external monitoring implementation
- GC STW duration exceed maximum possible length (node should be stopped
before STW finished)

All this problems were reported by different persons different time ago,
So, we should reanalyze each of them and, possible, find better ways to
solve them than it described at issues.

P.s. IEP-7 [2] already contains 9 issues, feel free to mention something
else :)

[1] https://issues.apache.org/jira/browse/IGNITE-6961
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-7%3A+Ignite+internal+problems+detection

Ignite Enhancement Proposal #7 (Internal problems detection)

Reply via email to