Hi Anton,

I don't think that we should shutdown node in case of IgniteOOMException,
if one node has no space, then other probably  don't have it too, so re
-balancing will cause IgniteOOM on all other nodes and will kill the whole
cluster. I think for some configurations cluster should survive and allow
to user clean cache or/and add more nodes.

Thanks,
Mikhail.

20 нояб. 2017 г. 6:53 ПП пользователь "Anton Vinogradov" <
avinogra...@gridgain.com> написал:

> Igniters,
>
> Internal problems may and, unfortunately, cause unexpected cluster
> behavior.
> We should determine behavior in case any of internal problem happened.
>
> Well known internal problems can be split to:
> 1) OOM or any other reason cause node crash
>
> 2) Situations required graceful node shutdown with custom notification
> - IgniteOutOfMemoryException
> - Persistence errors
> - ExchangeWorker exits with error
>
> 3) Prefomance issues should be covered by metrics
> - GC STW duration
> - Timed out tasks and jobs
> - TX deadlock
> - Hanged Tx (waits for some service)
> - Java Deadlocks
>
> I created special issue [1] to make sure all these metrics will be
> presented at WebConsole or VisorConsole (what's preferred?)
>
> 4) Situations required external monitoring implementation
> - GC STW duration exceed maximum possible length (node should be stopped
> before STW finished)
>
> All this problems were reported by different persons different time ago,
> So, we should reanalyze each of them and, possible, find better ways to
> solve them than it described at issues.
>
> P.s. IEP-7 [2] already contains 9 issues, feel free to mention something
> else :)
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6961
> [2]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 7%3A+Ignite+internal+problems+detection
>

Reply via email to