It seems you're talking about Failure Detection (Timeouts). Will it detect node failure on still cluster?
On Wed, Apr 8, 2020 at 11:52 AM Stephen Darlington < stephen.darling...@gridgain.com> wrote: > The configuration parameters that I’m aware of are here: > > > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html > > Other people would be better placed to discuss the internals. > > Regards. > Stephen > > > On 8 Apr 2020, at 09:32, Anton Vinogradov <a...@apache.org> wrote: > > > > Stephen, > > > >> Nodes check on their neighbours and notify the remaining nodes if one > > disappears. > > Could you explain how this works in detail? > > How can I set/change check frequency? > > > > On Wed, Apr 8, 2020 at 11:13 AM Stephen Darlington < > > stephen.darling...@gridgain.com> wrote: > > > >> This is one of the functions of the DiscoverySPI. Nodes check on their > >> neighbours and notify the remaining nodes if one disappears. When the > >> topology changes, it triggers a rebalance, which relocates primary > >> partitions to live nodes. This is entirely transparent to clients. > >> > >> It gets more complex… like there’s the partition loss policy and > >> rebalancing doesn’t always happen (configurable, persistence, etc)… but > >> broadly it does as you expect. > >> > >> Regards, > >> Stephen > >> > >>> On 8 Apr 2020, at 08:40, Anton Vinogradov <a...@apache.org> wrote: > >>> > >>> Igniters, > >>> Do we have some feature allows to check nodes aliveness on a regular > >> basis? > >>> > >>> Scenario: > >>> Precondition > >>> The cluster has no load but some node's JVM crashed. > >>> > >>> Expected actual > >>> The user performs an operation (eg. cache put) related to this node > (via > >>> another node) and waits for some timeout to gain it's dead. > >>> The cluster starts the switch to relocate primary partitions to alive > >>> nodes. > >>> Now user able to retry the operation. > >>> > >>> Desired > >>> Some WatchDog checks nodes aliveness on a regular basis. > >>> Once a failure detected, the cluster starts the switch. > >>> Later, the user performs an operation on an already fixed cluster and > >>> waits for nothing. > >>> > >>> It would be good news if the "Desired" case is already Actual. > >>> Can somebody point to the feature that performs this check? > >> > >> > >> > > >