Re: Metric showing how many nodes may safely leave the cluster

Alex Plehanov Fri, 04 Oct 2019 06:35:24 -0700

Ivan, there already exist metric
CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the
current redundancy level for the cache group.
We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data
loss in this cache group.


пт, 4 окт. 2019 г. в 16:17, Ivan Rakov <[email protected]>:

> Igniters,
>
> I've seen numerous requests to find out an easy way to check whether is
> it safe to turn off cluster node. As we know, in Ignite protection from
> sudden node shutdown is implemented through keeping several backup
> copies of each partition. However, this guarantee can be weakened for a
> while in case cluster has recently experienced node restart and
> rebalancing process is still in progress.
> Example scenario is restarting nodes one by one in order to update a
> local configuration parameter. User restarts one node and rebalancing
> starts: when it will be completed, it will be safe to proceed (backup
> count=1). However, there's no transparent way to determine whether
> rebalancing is over.
>  From my perspective, it would be very helpful to:
> 1) Add information about rebalancing and number of free-to-go nodes to
> ./control.sh --state command.
> Examples of output:
>
> > Cluster  ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc
> > Cluster tag: new_tag
> >
> --------------------------------------------------------------------------------
> > Cluster is active
> > All partitions are up-to-date.
> > 3 node(s) can safely leave the cluster without partition loss.
> > Cluster  ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc
> > Cluster tag: new_tag
> >
> --------------------------------------------------------------------------------
> > Cluster is active
> > Rebalancing is in progress.
> > 1 node(s) can safely leave the cluster without partition loss.
> 2) Provide the same information via ClusterMetrics. For example:
> ClusterMetrics#isRebalanceInProgress // boolean
> ClusterMetrics#getSafeToLeaveNodesCount // int
>
> Here I need to mention that this information can be calculated from
> existing rebalance metrics (see CacheMetrics#*rebalance*). However, I
> still think that we need more simple and understandable flag whether
> cluster is in danger of data loss. Another point is that current metrics
> are bound to specific cache, which makes this information even harder to
> analyze.
>
> Thoughts?
>
> --
> Best Regards,
> Ivan Rakov
>
>

Re: Metric showing how many nodes may safely leave the cluster

Reply via email to