[ 
https://issues.apache.org/jira/browse/HDDS-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen resolved HDDS-6241.
------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

> Follower SCM node repeatedly sending requests to Ratis server
> -------------------------------------------------------------
>
>                 Key: HDDS-6241
>                 URL: https://issues.apache.org/jira/browse/HDDS-6241
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>            Reporter: George Huang
>            Assignee: Tsz-wo Sze
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> Follower SCM-HA leader node repeatedly sending requests to Ratis server. The 
> SCM node has been in this state for many days or even weeks. A SCM log could 
> look like this:
> :
> 2022-02-01 11:54:35,413 INFO 
> org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: 
> Moving container #290631 to CLOSED state, datanode 
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\{ip: xx.xx.xxx.xx, host: 
> xxxxxx.xxxxx.xxxxxxxx.com, ports: [REPLICATION=9886, RATIS=9858, 
> RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: 
> /default, certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0} reported CLOSED replica.
> 2022-02-01 11:54:35,414 INFO 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler: Invoking method public 
> abstract void 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerState(org.apache.hadoop.hdds.protocol.proto.HddsProtos$ContainerID,org.apache.hadoop.hdds.protocol.proto.HddsProtos$LifeCycleEvent)
>  throws 
> java.io.IOException,org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException
>  on target org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl@23e9a90a, cost 
> 124.728us
> 2022-02-01 11:54:35,414 ERROR 
> org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: 
> Exception while processing ICR for container 290631
> org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@group-XXXXXXXXXXXX is not the leader 
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|rpc:xxxxxx.xxxxx.xxxxxxxx.com:9894|admin:|client:|dataStream:|priority:0
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:667)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.checkLeaderState(RaftServerImpl.java:632)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:758)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$9(RaftServerProxy.java:437)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$7(RaftServerProxy.java:432)
>         at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:115)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$8(RaftServerProxy.java:432)
>         at 
> java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995)
>         at 
> java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:431)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:437)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:222)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
>         at com.sun.proxy.$Proxy15.updateContainerState(Unknown Source)
>         at 
> org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:273)
>         at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:227)
>         at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:96)
>         at 
> org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler.onMessage(IncrementalContainerReportHandler.java:88)
>         at 
> org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler.onMessage(IncrementalContainerReportHandler.java:40)
>         at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2022-02-01 11:54:35,423 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #290631
> 2022-02-01 11:54:35,424 WARN org.apache.hadoop.hdds.scm.ha.SCMContext: 
> getTerm is invoked when not leader.
> 2022-02-01 11:54:35,424 WARN 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Skip sending 
> close container command, since current SCM is not leader.
> org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@group-XXXXXXXXXXXX is not the leader 
> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx|rpc:xxxxxx.xxxxx.xxxxxxxx.com:9894|admin:|client:|dataStream:|priority:0
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.triggerNotLeaderException(SCMRatisServerImpl.java:278)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMContext.getTermOfLeader(SCMContext.java:191)
>         at 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:85)
>         at 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:50)
>         at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to