Ivan, there already exist metric CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the current redundancy level for the cache group. We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data loss in this cache group.
пт, 4 окт. 2019 г. в 16:17, Ivan Rakov <ivan.glu...@gmail.com>: > Igniters, > > I've seen numerous requests to find out an easy way to check whether is > it safe to turn off cluster node. As we know, in Ignite protection from > sudden node shutdown is implemented through keeping several backup > copies of each partition. However, this guarantee can be weakened for a > while in case cluster has recently experienced node restart and > rebalancing process is still in progress. > Example scenario is restarting nodes one by one in order to update a > local configuration parameter. User restarts one node and rebalancing > starts: when it will be completed, it will be safe to proceed (backup > count=1). However, there's no transparent way to determine whether > rebalancing is over. > From my perspective, it would be very helpful to: > 1) Add information about rebalancing and number of free-to-go nodes to > ./control.sh --state command. > Examples of output: > > > Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc > > Cluster tag: new_tag > > > -------------------------------------------------------------------------------- > > Cluster is active > > All partitions are up-to-date. > > 3 node(s) can safely leave the cluster without partition loss. > > Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc > > Cluster tag: new_tag > > > -------------------------------------------------------------------------------- > > Cluster is active > > Rebalancing is in progress. > > 1 node(s) can safely leave the cluster without partition loss. > 2) Provide the same information via ClusterMetrics. For example: > ClusterMetrics#isRebalanceInProgress // boolean > ClusterMetrics#getSafeToLeaveNodesCount // int > > Here I need to mention that this information can be calculated from > existing rebalance metrics (see CacheMetrics#*rebalance*). However, I > still think that we need more simple and understandable flag whether > cluster is in danger of data loss. Another point is that current metrics > are bound to specific cache, which makes this information even harder to > analyze. > > Thoughts? > > -- > Best Regards, > Ivan Rakov > >