Ivan Andika created HDDS-11750:
----------------------------------
Summary: LegacyReplicationManager#notifyStatusChanged should not
send Ratis request
Key: HDDS-11750
URL: https://issues.apache.org/jira/browse/HDDS-11750
Project: Apache Ozone
Issue Type: Bug
Components: SCM HA
Reporter: Ivan Andika
We encountered an issue where the SCM is stuck after transfer leadership,
causing all SCM client requests to timeout (including OMs).
We saw that SCM is throwing TimeoutException in StateMachineUpdater (the thread
in charge of applying Raft logs and completing user requests), causing the
whole SCM request processing to be stuck.
{code:java}
2024-11-18 15:54:50,183
[daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater]
INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: can not remove
source replica after successfully replicated to target datanode
2024-11-18 15:54:50,745
[EventQueue-CloseContainerForCloseContainerEventHandler] ERROR
org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution
message #8764007
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy16.updateContainerState(Unknown Source)
at
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:332)
at
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:82)
at
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:51)
at
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
... 8 more
2024-11-18 15:54:50,746 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl]
WARN org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager:
Could not commit delete block transactions: []
java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
at
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
at com.sun.proxy.$Proxy17.removeTransactionsFromDB(Unknown Source)
at
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager.commitTransactions(SCMDeletedBlockTransactionStatusManager.java:527)
at
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:384)
at
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:73)
at
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}
We found that the root cause this following call chains
* ReplicationManager#notifyStatusChanged (due to the transfer leadership)
** LegacyReplicationManager#notifyStatusChanged
*** LegacyReplicationManager#onLeaderReadyAndOutofSafeMode
**** LegacyReplicationManager#deleteSrcDnForMove
***** LegacyReplicationManager.MoveScheduler#completeMove (@Replicated
annotation means that it will submit Ratis request)
****** SCMHAInvocationHandler#invokeRatisServer
******* SCMRatiServerImpl#submitRequest
******** RaftServerImpl#submitClientRequestAsync
We should never send a Ratis request under the
ReplicationManager#notifyStatusChanged since this will causes a deadlock with
Ratis StateMachineUpdater. When ReplicationManager#notifyStatusChanged call the
MoveScheduler#completeMove, it will send a Ratis request to the Raft server and
wait until the log associated with it is applied by the StateMachineUpdater.
However, since ReplicationManger#notifyStatusChanged is run under the
StateMachineUpdater, it will block the StateMachineUpdater itself, meaning that
the Raft log associated with request sent by MoveScheduler#completeMove will
never be applied and there will be a deadlock. This will cause
StateMachineUpdater to get stuck and most SCM client requests to timeout in the
StateMachineUpdater.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]