[ 
https://issues.apache.org/jira/browse/HDDS-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809583#comment-17809583
 ] 

Wei-Chiu Chuang commented on HDDS-10176:
----------------------------------------

cc: [~szetszwo]  might be a Ratis bug.

> OzoneManagerRatisServer waiting for LeaderState to stop
> -------------------------------------------------------
>
>                 Key: HDDS-10176
>                 URL: https://issues.apache.org/jira/browse/HDDS-10176
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Attila Doroszlai
>            Priority: Major
>         Attachments: 2024-01-20T18-15-24_117-jvmRun1.dump, 
> org.apache.hadoop.ozone.om.TestOzoneManagerPrepare-output.txt
>
>
> Cluster shutdown is waiting for executor to shutdown:
> {code}
> "main" 
>    java.lang.Thread.State: TIMED_WAITING
>         at sun.misc.Unsafe.park(Native Method)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>         at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
>         at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:144)
>         at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:136)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$9(RaftServerProxy.java:438)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy$$Lambda$1625/1179968371.run(Unknown
>  Source)
>         at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
>         at 
> org.apache.ratis.util.LifeCycle$$Lambda$1127/1377650693.get(Unknown Source)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:415)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.stop(OzoneManagerRatisServer.java:597)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.stop(OzoneManager.java:2226)
>         at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopOM(MiniOzoneClusterImpl.java:557)
>         at 
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.stop(MiniOzoneHAClusterImpl.java:311)
>         at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:453)
>         at 
> org.apache.hadoop.ozone.om.TestOzoneManagerHA.shutdown(TestOzoneManagerHA.java:209)
> {code}
> Ratis might wait up to 1 day:
> {code:title=https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerProxy.java#L438}
>         ConcurrentUtils.shutdownAndWait(executor.get());
> {code}
> {code:title=https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-common/src/main/java/org/apache/ratis/util/ConcurrentUtils.java#L135-L136}
>   static void shutdownAndWait(ExecutorService executor) {
>     shutdownAndWait(TimeDuration.ONE_DAY, executor, timeout -> {
> {code}
> The executor is running a task that's also waiting:
> {code}
> "omNode-3-impl-thread1" 
>    java.lang.Thread.State: WAITING
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>         at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>         at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>         at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$close$3(RaftServerImpl.java:543)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl$$Lambda$1628/1400495203.run(Unknown
>  Source)
>         at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
>         at 
> org.apache.ratis.util.LifeCycle$$Lambda$1127/1377650693.get(Unknown Source)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:525)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.close(RaftServerProxy.java:136)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.lambda$close$0(RaftServerProxy.java:123)
>         at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap$$Lambda$1626/677696333.accept(Unknown
>  Source)
>         at 
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:203)
>         at 
> org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:182)
>         at 
> org.apache.ratis.util.ConcurrentUtils$$Lambda$790/801783477.run(Unknown 
> Source)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
> {code}
> {code:title=https://github.com/apache/ratis/blob/ratis-3.0.1/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L543}
>         role.shutdownLeaderState(true).join();
> {code}
> {code:title=log}
> 2024-01-20 18:19:01,239 [main] INFO  ratis.OzoneManagerRatisServer 
> (OzoneManagerRatisServer.java:stop(594)) - Stopping 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer@299dd381 at port 
> 15058
> 2024-01-20 18:19:01,239 [omNode-3@group-523986131536-LeaderElection22] INFO  
> segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:rollLogSegment(435)) - 
> omNode-3@group-523986131536-SegmentedRaftLogWorker: Rolling segment log-47_48 
> to index:48
> 2024-01-20 18:19:01,239 [main] INFO  server.RaftServer 
> (RaftServerProxy.java:lambda$close$9(416)) - omNode-3: close
> 2024-01-20 18:19:01,239 [main] INFO  server.GrpcService 
> (GrpcService.java:closeImpl(311)) - omNode-3: shutdown server 
> GrpcServerProtocolService now
> 2024-01-20 18:19:01,239 [omNode-3@group-523986131536-LeaderElection22] INFO  
> server.RaftServer$Division (ServerState.java:setRaftConf(386)) - 
> omNode-3@group-523986131536: set configuration 49: 
> peers:[omNode-3|localhost:15058, omNode-1|localhost:15050, 
> omNode-2|localhost:15054]|listeners:[], old=null
> 2024-01-20 18:19:01,239 [omNode-3@group-523986131536-SegmentedRaftLogWorker] 
> INFO  segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:execute(591)) - 
> omNode-3@group-523986131536-SegmentedRaftLogWorker: Rolled log segment from 
> /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-9c8bbe94-a664-4f6d-892b-2c9911aeb6c1/omNode-3/ratis/c9bc4cf4-3bc3-3c60-a66b-523986131536/current/log_inprogress_47
>  to 
> /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-9c8bbe94-a664-4f6d-892b-2c9911aeb6c1/omNode-3/ratis/c9bc4cf4-3bc3-3c60-a66b-523986131536/current/log_47-48
> 2024-01-20 18:19:01,239 [omNode-3-impl-thread1] INFO  
> server.RaftServer$Division (RaftServerImpl.java:lambda$close$3(526)) - 
> omNode-3@group-523986131536: shutdown
> 2024-01-20 18:19:01,240 [omNode-3@group-523986131536-SegmentedRaftLogWorker] 
> INFO  segmented.BufferedWriteChannel (BufferedWriteChannel.java:open(62)) - 
> open log_inprogress_49 at position 0
> 2024-01-20 18:19:01,240 [omNode-3-impl-thread1] INFO  util.JmxRegister 
> (JmxRegister.java:unregister(73)) - Successfully un-registered JMX Bean with 
> object name Ratis:service=RaftServer,group=group-523986131536,id=omNode-3
> 2024-01-20 18:19:01,240 [omNode-3-impl-thread1] INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownLeaderState(94)) - omNode-3: shutdown 
> omNode-3@group-523986131536-LeaderStateImpl
> 2024-01-20 18:19:01,240 [omNode-3-impl-thread1] INFO  impl.PendingRequests 
> (PendingRequests.java:sendNotLeaderResponses(289)) - 
> omNode-3@group-523986131536-PendingRequests: sendNotLeaderResponses
> 2024-01-20 18:19:01,247 [omNode-3@group-523986131536-SegmentedRaftLogWorker] 
> INFO  segmented.SegmentedRaftLogWorker 
> (SegmentedRaftLogWorker.java:execute(634)) - 
> omNode-3@group-523986131536-SegmentedRaftLogWorker: created new log segment 
> /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-9c8bbe94-a664-4f6d-892b-2c9911aeb6c1/omNode-3/ratis/c9bc4cf4-3bc3-3c60-a66b-523986131536/current/log_inprogress_49
> 2024-01-20 18:19:01,260 [Thread-4860] INFO  server.GrpcServerProtocolClient 
> (GrpcServerProtocolClient.java:close(103)) - omNode-1 Close channels
> 2024-01-20 18:19:01,261 [Thread-4861] INFO  server.GrpcServerProtocolClient 
> (GrpcServerProtocolClient.java:close(103)) - omNode-2 Close channels
> 2024-01-20 18:19:01,262 [main] INFO  server.GrpcService 
> (GrpcService.java:closeImpl(320)) - omNode-3: shutdown server 
> GrpcServerProtocolService successfully
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to