[
https://issues.apache.org/jira/browse/HDDS-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika resolved HDDS-11750.
--------------------------------
Resolution: Won't Fix
This can be fixed by switching to ReplicationManager V2.
> LegacyReplicationManager#notifyStatusChanged should not submit Ratis request
> ----------------------------------------------------------------------------
>
> Key: HDDS-11750
> URL: https://issues.apache.org/jira/browse/HDDS-11750
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM HA
> Affects Versions: 1.2.0, 1.3.0, 1.4.0
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Critical
> Labels: pull-request-available
>
> We encountered an issue where the SCM is stuck after transfer leadership,
> causing SCM leader to be stuck and all client requests to timeout (including
> OMs).
> We saw that SCM is throwing TimeoutException in StateMachineUpdater (the
> thread in charge of applying Raft logs and completing user requests), causing
> the whole SCM request processing to be stuck.
> {code:java}
> 2024-11-18 15:54:50,182
> [daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater]
> ERROR org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception
> while cleaning up excess replicas.
> java.util.concurrent.TimeoutException
> at
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> at
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
> at com.sun.proxy.$Proxy19.completeMove(Unknown Source)
> at
> org.apache.hadoop.hdds.scm.container.ReplicationManager.deleteSrcDnForMove(ReplicationManager.java:1610)
> at
> org.apache.hadoop.hdds.scm.container.ReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$36(ReplicationManager.java:2364)
> at
> java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
> at
> org.apache.hadoop.hdds.scm.container.ReplicationManager.onLeaderReadyAndOutOfSafeMode(ReplicationManager.java:2342)
> at
> org.apache.hadoop.hdds.scm.container.ReplicationManager.notifyStatusChanged(ReplicationManager.java:2103)
> at
> org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:53)
> at
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:338)
> at
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1650)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
> at java.lang.Thread.run(Thread.java:748)
> 2024-11-18 15:54:50,183
> [daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater]
> INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: can not remove
> source replica after successfully replicated to target datanode
> 2024-11-18 15:54:50,745
> [EventQueue-CloseContainerForCloseContainerEventHandler] ERROR
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution
> message #8764007
> java.lang.reflect.UndeclaredThrowableException
> at com.sun.proxy.$Proxy16.updateContainerState(Unknown Source)
> at
> org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:332)
> at
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:82)
> at
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:51)
> at
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
> at
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> at
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
> ... 8 more
> 2024-11-18 15:54:50,746 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl]
> WARN
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager:
> Could not commit delete block transactions: []
> java.util.concurrent.TimeoutException
> at
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> at
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
> at
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
> at com.sun.proxy.$Proxy17.removeTransactionsFromDB(Unknown Source)
> at
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager.commitTransactions(SCMDeletedBlockTransactionStatusManager.java:527)
> at
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:384)
> at
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:73)
> at
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> We found that the root cause this following call chains.
> * StateMachine#notifyTermIndexUpdated (due to the transfer leadership)
> ** ReplicationManager#notifyStatusChanged
> *** LegacyReplicationManager#notifyStatusChanged
> **** LegacyReplicationManager#onLeaderReadyAndOutofSafeMode
> ***** LegacyReplicationManager#deleteSrcDnForMove
> ****** LegacyReplicationManager.MoveScheduler#completeMove (@Replicated
> annotation means that it will submit Ratis request)
> ******* SCMHAInvocationHandler#invokeRatisServer
> ******** SCMRatiServerImpl#submitRequest
> ********* RaftServerImpl#submitClientRequestAsync
> We should never send a Ratis request under the
> ReplicationManager#notifyStatusChanged since this will cause a deadlock with
> Ratis StateMachineUpdater. When ReplicationManager#notifyStatusChanged call
> the MoveScheduler#completeMove, it will send a Ratis request to the Raft
> server and wait until the log associated with it is applied by the
> StateMachineUpdater. However, since ReplicationManger#notifyStatusChanged is
> itself run under the StateMachineUpdater, it will block the
> StateMachineUpdater itself, meaning that the Raft log associated with request
> sent by MoveScheduler#completeMove will never be applied and there will be a
> deadlock. This will cause StateMachineUpdater to get stuck and most SCM
> client requests to timeout in the StateMachineUpdater.
> Currently, one possible fix might be to just remove the
> onLeaderReadyAndOutOfSafeMode implementation altogether and hopefully the
> inflight move will be handled up by the main ReplicationManager thread.
> Note: The issue should STILL happen after HDDS-10690 since although
> StateMachine#notifyTermIndexUpdated will not trigger
> ReplicationManager#notifyStatusChanged, StateMachine#notifyLeaderReady will
> instead trigger the Replicationmanager#notifyStatusChanged. Since
> StateMachine#notifyLeaderReady is still called in the StateMachineUpdater
> through RaftServerImpl#applyLogToStateMachine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]