[ 
https://issues.apache.org/jira/browse/IGNITE-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-17056:
---------------------------------
    Description: 
There are cases when a current leader cannot perform rebalance on specified set 
of nodes, for example, when some node from the raft group permanently fails 
with \{{RaftError#ECATCHUP}}. For such scenario retry mechanism is implemented 
in IGNITE-16801, but we cannot retry rebalance intent infinitely, so there 
should be implemented mechanism for canceling a rebalance. 

Naive canceling could be implemented by removing {{pending key}} and replacing 
it with {{planned key}}. But this approach has several crucial limitations and 
may cause inconsistency in the current rebalance protocol, for example, when 
there is a race between cancel and applying new assignment to the {{stable 
key}} from the new leader. We can remove {{pending key}} right before applying 
new assignment to the {{stable key}}, so we cannot resolve peers to ClusterIds, 
which is made on a union of pending and stable keys. 

Also there is a case, when we can lost planned rebalance:
 # Current leader retries failed rebalance
 # Current leader stops being leader for some reasons and sleeps
 # New leader performs rebalance and calls 
{{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}}
 # At this moment old leader wakes up and cancels the current rebalance, so it 
removes pending and writes to it planned key.
 # At this moment we receive 
{{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}} from the 
new leader, see that planned is empty, so we just delete pending key, but this 
is not correct to delete this key as far as the rebalance that is associated to 
the removed key hasn't been performed yet.

Also we should consider separating scenarios for recoverable and unrecoverable 
errors, because it might be useless to retry rebalance, if some participating 
node fails with unrecoverable error. 
Seems like we should properly think about introducing some failure handling for 
such exceptional scenarios. 
 

  was:
There are cases when current leader cannot perform rebalance on specified set 
of nodes, for example, when some node from the raft group permanently fails 
with \{{RaftError#ECATCHUP}}. For such scenario retry mechanism is implemented 
in IGNITE-16801, but we cannot retry rebalance intent infinitely, so there 
should be implemented mechanism for canceling a rebalance. 

Naive canceling could be implemented by removing {{pending key}} and replacing 
it with {{planned key}}. But this approach has several crucial limitations and 
may cause inconsistency in the current rebalance protocol, for example, when 
there is a race between cancel and applying new assignment to the {{stable 
key}} from the new leader. We can remove {{pending key}} right before applying 
new assignment to the {{stable key}}, so we cannot resolve peers to ClusterIds, 
which is made on a union of pending and stable keys. 

Also there is a case, when we can lost planned rebalance:
 # Current leader retries failed rebalance
 # Current leader stops being leader for some reasons and sleeps
 # New leader performs rebalance and calls 
{{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}}
 # At this moment old leader wakes up and cancels the current rebalance, so it 
removes pending and writes to it planned key.
 # At this moment we receive 
{{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}} from the 
new leader, see that planned is empty, so we just delete pending key, but this 
is not correct to delete this key as far as the rebalance that is associated to 
the removed key hasn't been performed yet.

Also we should consider separating scenarios for recoverable and unrecoverable 
errors, because it might be useless to retry rebalance, if some participating 
node fails with unrecoverable error. 
Seems like we should properly think about introducing some failure handling for 
such exceptional scenarios. 
 


> Implement rebalance cancel mechanism
> ------------------------------------
>
>                 Key: IGNITE-17056
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17056
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>
> There are cases when a current leader cannot perform rebalance on specified 
> set of nodes, for example, when some node from the raft group permanently 
> fails with \{{RaftError#ECATCHUP}}. For such scenario retry mechanism is 
> implemented in IGNITE-16801, but we cannot retry rebalance intent infinitely, 
> so there should be implemented mechanism for canceling a rebalance. 
> Naive canceling could be implemented by removing {{pending key}} and 
> replacing it with {{planned key}}. But this approach has several crucial 
> limitations and may cause inconsistency in the current rebalance protocol, 
> for example, when there is a race between cancel and applying new assignment 
> to the {{stable key}} from the new leader. We can remove {{pending key}} 
> right before applying new assignment to the {{stable key}}, so we cannot 
> resolve peers to ClusterIds, which is made on a union of pending and stable 
> keys. 
> Also there is a case, when we can lost planned rebalance:
>  # Current leader retries failed rebalance
>  # Current leader stops being leader for some reasons and sleeps
>  # New leader performs rebalance and calls 
> {{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}}
>  # At this moment old leader wakes up and cancels the current rebalance, so 
> it removes pending and writes to it planned key.
>  # At this moment we receive 
> {{RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied}} from the 
> new leader, see that planned is empty, so we just delete pending key, but 
> this is not correct to delete this key as far as the rebalance that is 
> associated to the removed key hasn't been performed yet.
> Also we should consider separating scenarios for recoverable and 
> unrecoverable errors, because it might be useless to retry rebalance, if some 
> participating node fails with unrecoverable error. 
> Seems like we should properly think about introducing some failure handling 
> for such exceptional scenarios. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to