[
https://issues.apache.org/jira/browse/IGNITE-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692197#comment-17692197
]
Denis Chudov commented on IGNITE-17056:
---------------------------------------
[~kgusakov] lgtm.
> Design rebalance cancel mechanism
> ---------------------------------
>
> Key: IGNITE-17056
> URL: https://issues.apache.org/jira/browse/IGNITE-17056
> Project: Ignite
> Issue Type: Task
> Reporter: Mirza Aliev
> Assignee: Kirill Gusakov
> Priority: Major
> Labels: ignite-3
> Time Spent: 40m
> Remaining Estimate: 0h
>
> There are cases when a current leader cannot perform rebalance on specified
> set of nodes, for example, when some node from the raft group permanently
> fails with RaftError#ECATCHUP. For such scenario retry mechanism is
> implemented in IGNITE-16801, but we cannot retry rebalance intent infinitely,
> so there should be implemented mechanism for canceling a rebalance.
> Naive canceling could be implemented by removing pending key and replacing it
> with planned key. But this approach has several crucial limitations and may
> cause inconsistency in the current rebalance protocol, for example, when
> there is a race between cancel and applying new assignment to the stable key
> from the new leader. We can remove pending key right before applying new
> assignment to the stable key, so we cannot resolve peers to ClusterIds, which
> is made on a union of pending and stable keys.
> Also there is a case, when we can lost planned rebalance:
> # Current leader retries failed rebalance
> # Current leader stops being leader for some reasons and sleeps
> # New leader performs rebalance and calls
> RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied
> # At this moment old leader wakes up and cancels the current rebalance, so
> it removes pending and writes to it planned key.
> # At this moment we receive
> RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied from the new
> leader, see that planned is empty, so we just delete pending key, but this is
> not correct to delete this key as far as the rebalance that is associated to
> the removed key hasn't been performed yet.
> Also we should consider separating scenarios for recoverable and
> unrecoverable errors, because it might be useless to retry rebalance, if some
> participating node fails with unrecoverable error.
> Seems like we should properly think about introducing some failure handling
> for such exceptional scenarios.
> New node role from https://issues.apache.org/jira/browse/IGNITE-17252 primary
> replica, can help us to resolve this issue in a simplier way and cancel
> rebalance from the primary replica.
>
> As a result of this issue we must design correct algorithm for cancelling
> hanged rebalance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)