Re: Active nodes aliveness WatchDog

Stephen Darlington Wed, 08 Apr 2020 01:13:46 -0700

This is one of the functions of the DiscoverySPI. Nodes check on their 
neighbours and notify the remaining nodes if one disappears. When the topology 
changes, it triggers a rebalance, which relocates primary partitions to live 
nodes. This is entirely transparent to clients.


It gets more complex… like there’s the partition loss policy and rebalancing 
doesn’t always happen (configurable, persistence, etc)… but broadly it does as 
you expect.

Regards,
Stephen

> On 8 Apr 2020, at 08:40, Anton Vinogradov <[email protected]> wrote:
> 
> Igniters,
> Do we have some feature allows to check nodes aliveness on a regular basis?
> 
> Scenario:
> Precondition
>  The cluster has no load but some node's JVM crashed.
> 
> Expected actual
>  The user performs an operation (eg. cache put) related to this node (via
> another node) and waits for some timeout to gain it's dead.
>  The cluster starts the switch to relocate primary partitions to alive
> nodes.
>  Now user able to retry the operation.
> 
> Desired
>  Some WatchDog checks nodes aliveness on a regular basis.
>  Once a failure detected, the cluster starts the switch.
>  Later, the user performs an operation on an already fixed cluster and
> waits for nothing.
> 
> It would be good news if the "Desired" case is already Actual.
> Can somebody point to the feature that performs this check?

Re: Active nodes aliveness WatchDog

Reply via email to