[ 
https://issues.apache.org/jira/browse/HDDS-11291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-11291:
----------------------------------
    Labels: pull-request-available  (was: )

> Datanode Command Handler blocked by executing ratis requests
> ------------------------------------------------------------
>
>                 Key: HDDS-11291
>                 URL: https://issues.apache.org/jira/browse/HDDS-11291
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Janus Chow
>            Assignee: Janus Chow
>            Priority: Major
>              Labels: pull-request-available
>
> We met the following issue: Datanode command handler executing close 
> container request, but the timeout logic is not correct, so it blocks all 
> requests from SCM.
> The jstack shows as follows:
> {code:java}
> "Command processor thread" #215 daemon prio=5 os_prio=0 
> tid=0x00007fcef3262000 nid=0xa56 waiting on condition [0x00007fcf63f9d000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007fd4ab6dcd38> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>         at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>         at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>         at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.executeSubmitClientRequestAsync(RaftServerImpl.java:816)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:436)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy$$Lambda$827/1961332062.apply(Unknown
>  Source)
>         at 
> java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995)
>         at 
> java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:436)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.submitRequest(XceiverServerRatis.java:611)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:105)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:103)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$3(DatanodeStateMachine.java:593)
>         at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine$$Lambda$270/1788388131.run(Unknown
>  Source)
>         at java.lang.Thread.run(Thread.java:748) {code}
> The direct reason is the timeout logic is not working, because in Ratis the 
> executeSubmitClientRequestAsync is a join() operation, and it will block the 
> timeout on the outer CompletableFuture.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to