[ 
https://issues.apache.org/jira/browse/HDDS-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika resolved HDDS-11750.
--------------------------------
    Resolution: Won't Fix

This can be fixed by switching to ReplicationManager V2.

> LegacyReplicationManager#notifyStatusChanged should not submit Ratis request
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-11750
>                 URL: https://issues.apache.org/jira/browse/HDDS-11750
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Critical
>              Labels: pull-request-available
>
> We encountered an issue where the SCM is stuck after transfer leadership, 
> causing SCM leader to be stuck and all client requests to timeout (including 
> OMs).
> We saw that SCM is throwing TimeoutException in StateMachineUpdater (the 
> thread in charge of applying Raft logs and completing user requests), causing 
> the whole SCM request processing to be stuck.
> {code:java}
> 2024-11-18 15:54:50,182 
> [daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater] 
> ERROR org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception 
> while cleaning up excess replicas.
> java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
>         at com.sun.proxy.$Proxy19.completeMove(Unknown Source)
>         at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.deleteSrcDnForMove(ReplicationManager.java:1610)
>         at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$36(ReplicationManager.java:2364)
>         at 
> java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1597)
>         at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.onLeaderReadyAndOutOfSafeMode(ReplicationManager.java:2342)
>         at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.notifyStatusChanged(ReplicationManager.java:2103)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:53)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:338)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1650)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-11-18 15:54:50,183 
> [daa4f362-f48d-4933-96b3-840a8739f1d9@group-C0BCE64451CF-StateMachineUpdater] 
> INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: can not remove 
> source replica after successfully replicated to target datanode
> 2024-11-18 15:54:50,745 
> [EventQueue-CloseContainerForCloseContainerEventHandler] ERROR 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution 
> message #8764007
> java.lang.reflect.UndeclaredThrowableException
>         at com.sun.proxy.$Proxy16.updateContainerState(Unknown Source)
>         at 
> org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:332)
>         at 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:82)
>         at 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler.onMessage(CloseContainerEventHandler.java:51)
>         at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
>         ... 8 more
> 2024-11-18 15:54:50,746 [EventQueue-DeleteBlockStatusForDeletedBlockLogImpl] 
> WARN 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager: 
> Could not commit delete block transactions: []
> java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:228)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
>         at com.sun.proxy.$Proxy17.removeTransactionsFromDB(Unknown Source)
>         at 
> org.apache.hadoop.hdds.scm.block.SCMDeletedBlockTransactionStatusManager.commitTransactions(SCMDeletedBlockTransactionStatusManager.java:527)
>         at 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:384)
>         at 
> org.apache.hadoop.hdds.scm.block.DeletedBlockLogImpl.onMessage(DeletedBlockLogImpl.java:73)
>         at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
> We found that the root cause this following call chains. 
>  * StateMachine#notifyTermIndexUpdated (due to the transfer leadership)
>  ** ReplicationManager#notifyStatusChanged
>  *** LegacyReplicationManager#notifyStatusChanged 
>  **** LegacyReplicationManager#onLeaderReadyAndOutofSafeMode
>  ***** LegacyReplicationManager#deleteSrcDnForMove
>  ****** LegacyReplicationManager.MoveScheduler#completeMove (@Replicated 
> annotation means that it will submit Ratis request)
>  ******* SCMHAInvocationHandler#invokeRatisServer
>  ******** SCMRatiServerImpl#submitRequest
>  ********* RaftServerImpl#submitClientRequestAsync
> We should never send a Ratis request under the 
> ReplicationManager#notifyStatusChanged since this will cause a deadlock with 
> Ratis StateMachineUpdater. When ReplicationManager#notifyStatusChanged call 
> the MoveScheduler#completeMove, it will send a Ratis request to the Raft 
> server and wait until the log associated with it is applied by the 
> StateMachineUpdater. However, since ReplicationManger#notifyStatusChanged is 
> itself run under the StateMachineUpdater, it will block the 
> StateMachineUpdater itself, meaning that the Raft log associated with request 
> sent by MoveScheduler#completeMove will never be applied and there will be a 
> deadlock. This will cause StateMachineUpdater to get stuck and most SCM 
> client requests to timeout in the StateMachineUpdater.
> Currently, one possible fix might be to just remove the 
> onLeaderReadyAndOutOfSafeMode implementation altogether and hopefully the 
> inflight move will be handled up by the main ReplicationManager thread.
> Note: The issue should STILL happen after HDDS-10690 since although 
> StateMachine#notifyTermIndexUpdated will not trigger 
> ReplicationManager#notifyStatusChanged, StateMachine#notifyLeaderReady will 
> instead trigger the Replicationmanager#notifyStatusChanged. Since 
> StateMachine#notifyLeaderReady is still called in the StateMachineUpdater 
> through RaftServerImpl#applyLogToStateMachine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to