Ivan Andika created HDDS-11750:
----------------------------------

             Summary: LegacyReplicationManager#notifyStatusChanged should not 
send Ratis request
                 Key: HDDS-11750
                 URL: https://issues.apache.org/jira/browse/HDDS-11750
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM HA
            Reporter: Ivan Andika


We encountered an issue where the SCM is stuck after transfer leadership, 
causing all SCM client requests to timeout (including OMs).

We saw that SCM is throwing TimeoutException in StateMachineUpdater (the thread 
in charge of applying Raft logs and completing user requests), causing the 
whole SCM request processing to be stuck.

 
{code:java}
2024-11-18 15:54:50,183 
[daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater] 
INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: can not remove 
source replica after successfully replicated to target datanode
2024-11-18 15:54:50,745 
[EventQueue-CloseContainerForCloseContainerEventHandler] ERROR 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution 
message #8764007
java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy16.updateContainerState(Unknown Source)
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:332)
        at 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:82)
        at 
org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:51)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
        at 
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
        ... 8 more
2024-11-18 15:54:50,746 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] 
WARN org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager: 
Could not commit delete block transactions: []
java.util.concurrent.TimeoutException
        at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
        at 
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
        at com.sun.proxy.$Proxy17.removeTransactionsFromDB(Unknown Source)
        at 
org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager.commitTransactions(SCMDeletedBlockTransactionStatusManager.java:527)
        at 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:384)
        at 
org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:73)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
 

We found that the root cause this following call chains
 * ReplicationManager#notifyStatusChanged (due to the transfer leadership)
 ** LegacyReplicationManager#notifyStatusChanged 
 *** LegacyReplicationManager#onLeaderReadyAndOutofSafeMode
 **** LegacyReplicationManager#deleteSrcDnForMove
 ***** LegacyReplicationManager.MoveScheduler#completeMove (@Replicated 
annotation means that it will submit Ratis request)
 ****** SCMHAInvocationHandler#invokeRatisServer
 ******* SCMRatiServerImpl#submitRequest
 ******** RaftServerImpl#submitClientRequestAsync

We should never send a Ratis request under the 
ReplicationManager#notifyStatusChanged since this will causes a deadlock with 
Ratis StateMachineUpdater. When ReplicationManager#notifyStatusChanged call the 
MoveScheduler#completeMove, it will send a Ratis request to the Raft server and 
wait until the log associated with it is applied by the StateMachineUpdater. 
However, since ReplicationManger#notifyStatusChanged is run under the 
StateMachineUpdater, it will block the StateMachineUpdater itself, meaning that 
the Raft log associated with request sent by MoveScheduler#completeMove will 
never be applied and there will be a deadlock. This will cause 
StateMachineUpdater to get stuck and most SCM client requests to timeout in the 
StateMachineUpdater.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to