[
https://issues.apache.org/jira/browse/IGNITE-23633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041097#comment-18041097
]
Vladislav Pyatkov commented on IGNITE-23633:
--------------------------------------------
Merged 6f9322ca6863f631332357624c18799ec35b6762
> Retry ChangePeersAndLearnersRequest on fail while pendings handling
> -------------------------------------------------------------------
>
> Key: IGNITE-23633
> URL: https://issues.apache.org/jira/browse/IGNITE-23633
> Project: Ignite
> Issue Type: Improvement
> Reporter: Kirill Gusakov
> Assignee: Kirill Sizov
> Priority: Major
> Labels: MakeTeamcityGreenAgain, ha-instability, ignite-3
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> *Motivation*
> Now, the
> org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService#changePeersAndLearnersAsync
> future just returned to the metastore notification thread. As a result, on
> the error we will call the onError:
> {code:java}
> private WatchListener createPendingAssignmentsRebalanceListener() {
> return new WatchListener() {
> @Override
> public CompletableFuture<Void> onUpdate(WatchEvent evt) {
> if (!busyLock.enterBusy()) {
> return failedFuture(new NodeStoppingException());
> }
> try {
> Entry newEntry = evt.entryEvent().newEntry();
> return handleChangePendingAssignmentEvent(newEntry,
> evt.revision(), false);
> } finally {
> busyLock.leaveBusy();
> }
> }
> @Override
> public void onError(Throwable e) {
> LOG.warn("Unable to process pending assignments event", e);
> }
> };
> }
> {code}
> and just ignore the fail by design. It can be a cause of infinite invalid
> state of the partition group as a result.
> *Definition of Done*
> - Implement the retry cycle for changePeersAndLearnersAsync on the any errors
> *Implementation details*
> - We must retry changePeersAndLearnersAsync with the term received on the
> first iteration to prevent the issues connected with stale retries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)