[ 
https://issues.apache.org/jira/browse/IGNITE-23633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041097#comment-18041097
 ] 

Vladislav Pyatkov commented on IGNITE-23633:
--------------------------------------------

Merged 6f9322ca6863f631332357624c18799ec35b6762

> Retry ChangePeersAndLearnersRequest on fail while pendings handling
> -------------------------------------------------------------------
>
>                 Key: IGNITE-23633
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23633
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Gusakov
>            Assignee:  Kirill Sizov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, ha-instability, ignite-3
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Motivation*
> Now, the 
> org.apache.ignite.internal.raft.client.TopologyAwareRaftGroupService#changePeersAndLearnersAsync
>  future just returned to the metastore notification thread. As a result, on 
> the error we will call the onError:
> {code:java}
>  private WatchListener createPendingAssignmentsRebalanceListener() {
>         return new WatchListener() {
>             @Override
>             public CompletableFuture<Void> onUpdate(WatchEvent evt) {
>                 if (!busyLock.enterBusy()) {
>                     return failedFuture(new NodeStoppingException());
>                 }
>                 try {
>                     Entry newEntry = evt.entryEvent().newEntry();
>                     return handleChangePendingAssignmentEvent(newEntry, 
> evt.revision(), false);
>                 } finally {
>                     busyLock.leaveBusy();
>                 }
>             }
>             @Override
>             public void onError(Throwable e) {
>                 LOG.warn("Unable to process pending assignments event", e);
>             }
>         };
>     }
> {code}
> and just ignore the fail by design. It can be a cause of infinite invalid 
> state of the partition group as a result.
> *Definition of Done*
> - Implement the retry cycle for changePeersAndLearnersAsync on the any errors
> *Implementation details*
> - We must retry changePeersAndLearnersAsync with the term received on the 
> first iteration to prevent the issues connected with stale retries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to