[ 
https://issues.apache.org/jira/browse/RATIS-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved RATIS-1781.
-------------------------------
    Resolution: Cannot Reproduce

[~liuyaolong] , thanksĀ for the update!  Let's resolve this for now.

> Ratis raft conf keep an incorrect log entry index
> -------------------------------------------------
>
>                 Key: RATIS-1781
>                 URL: https://issues.apache.org/jira/browse/RATIS-1781
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Yaolong Liu
>            Priority: Major
>
> when transfer leader, ratis shell throw exception as follows:
> {code:java}
> bin/ratis sh election transfer -address xxxxxxx:xxxx -peers xxxxxxx:xxxx
> [main] INFO org.reflections.Reflections - Reflections took 55 ms to scan 1 
> urls, producing 5 keys and 18 values
> [main] INFO org.apache.ratis.metrics.MetricRegistries - Loaded 
> MetricRegistries class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> Applying new peer state before transferring leadership: 
> [xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:2|startupRole:FOLLOWER]
> Failed 
> SetConfigurationRequest:client-0716A1DE7C32->xxxxxxx:xxxx@group-ABB3109A44C1, 
> cid=2, seq=0, RW, null, SET_UNCONDITIONALLY, 
> servers:[xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:2|startupRole:FOLLOWER],
>  listeners:[] for 10 attempts with 
> org.apache.ratis.retry.ExponentialBackoffRetry@1e886a5b
> [main] ERROR org.apache.ratis.shell.cli.AbstractShell - Error running 
> election transfer -address xxxxxxx:xxxx -peers xxxxxxx:xxxx
> org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed 
> SetConfigurationRequest:client-0716A1DE7C32->xxxxxxx:xxxx@group-ABB3109A44C1, 
> cid=2, seq=0, RW, null, SET_UNCONDITIONALLY, 
> servers:[xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:2|startupRole:FOLLOWER],
>  listeners:[] for 10 attempts with 
> org.apache.ratis.retry.ExponentialBackoffRetry@1e886a5b
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:307)
>         at 
> org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:119)
>         at 
> org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:46)
>         at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:51)
>         at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:40)
>         at 
> org.apache.ratis.shell.cli.sh.election.TransferCommand.run(TransferCommand.java:80)
>         at 
> org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
>         at org.apache.ratis.shell.cli.sh.RatisShell.main(RatisShell.java:43)
> Caused by: 
> org.apache.ratis.protocol.exceptions.ReconfigurationInProgressException: 
> Reconfiguration is already in progress: {color:red}199839671{color}: 
> peers:[xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:2|startupRole:FOLLOWER,
>  
> xxxxxxx:xxxx|rpc:xxxxxxx:xxxx|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[],
>  old=null
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.setConfigurationAsync(RaftServerImpl.java:1133)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$21(RaftServerProxy.java:607)
>         at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:117)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$11(RaftServerImpl.java:809)
>         at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> The exception code snippet is in RaftServerImpl#setConfigurationAsync:
> {code:java}
> ....
> // make sure there is no other raft reconfiguration in progress
>       if (!current.isStable() || leaderState.inStagingState() || 
> !state.isConfCommitted()) {
>         throw new ReconfigurationInProgressException(
>             "Reconfiguration is already in progress: " + current);
>       }
> ....
> {code}
> Checking with arthas reveals that the first two conditions are correct and 
> the last condition is in-correct:
> {code:java}
> [arthas@262]$ watch org.apache.ratis.server.raftlog.RaftLogBase 
> getLastCommittedIndex '{returnObj}'
> Press Q or Ctrl+C to abort.
> method=org.apache.ratis.server.raftlog.RaftLogBase.getLastCommittedIndex 
> location=AtExit
> method=org.apache.ratis.server.raftlog.RaftLogBase.getLastCommittedIndex 
> location=AtExitAffect(class count: 3 , method count: 1) cost in 457 ms, 
> listenerId: 
> 5method=org.apache.ratis.server.raftlog.RaftLogBase.getLastCommittedIndex 
> location=AtExit
> ts=2023-02-13 15:19:22; [cost=0.107239ms] result=@ArrayList[
>     @Long[1433240],
> ]
> [arthas@262]$ watch org.apache.ratis.server.impl.RaftConfigurationImpl 
> getLogEntryIndex '{returnObj}'
> Press Q or Ctrl+C to abort.
> Affect(class count: 1 , method count: 1) cost in 188 ms, listenerId: 6
> method=org.apache.ratis.server.impl.RaftConfigurationImpl.getLogEntryIndex 
> location=AtExit
> ts=2023-02-13 15:21:03; [cost=0.079311ms] result=@ArrayList[
>     @Long[199839671],
> ]
> {code}
> Now the value of lastEntryIndex in current raft configuration is much larger 
> than raftlog.commitIndex. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to