[ 
https://issues.apache.org/jira/browse/HDDS-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719834#comment-17719834
 ] 

Hongbing Wang commented on HDDS-8366:
-------------------------------------

[~szetszwo] [~sumitagrawl]  Thanks for help. The online version without  
HDDS-7755. 
It seems same to HDDS-7755. The logs before the time point as follows:
{noformat}
2023-03-21 00:52:36,466 [IPC Server handler 89 on 9862] ERROR 
org.apache.hadoop.ozone.om.lock.OzoneManagerLock: Thread 'IPC Server handler 89 
on 9862' cannot acquire VOLUME_LOCK lock while ho
lding [BUCKET_LOCK] lock(s).
2023-03-21 00:52:36,466 [IPC Server handler 89 on 9862] WARN 
org.apache.hadoop.ipc.Server: IPC Server handler 89 on 9862, call Call#419046 
Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneMan
agerProtocol.submitRequest from 10.77.126.36:43592
java.lang.RuntimeException: Thread 'IPC Server handler 89 on 9862' cannot 
acquire VOLUME_LOCK lock while holding [BUCKET_LOCK] lock(s).
        at 
org.apache.hadoop.ozone.om.lock.OzoneManagerLock.lock(OzoneManagerLock.java:185)
        at 
org.apache.hadoop.ozone.om.lock.OzoneManagerLock.acquireReadLock(OzoneManagerLock.java:146)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getVolumeOwner(OzoneManager.java:2472)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getVolumeOwner(OzoneManager.java:2466)
        at 
org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2421)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getVolumeInfo(OzoneManager.java:2636)
        at 
org.apache.hadoop.ozone.om.OzoneManager.getS3VolumeContext(OzoneManager.java:3519)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getS3VolumeContext(OzoneManagerRequestHandler.java:1215)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:262)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:226)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:175)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716){noformat}

> OzoneManager hangs when submitRequestToRatis
> --------------------------------------------
>
>                 Key: HDDS-8366
>                 URL: https://issues.apache.org/jira/browse/HDDS-8366
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, Ozone Manager
>    Affects Versions: 1.3.0
>            Reporter: Hongbing Wang
>            Assignee: Sumit Agrawal
>            Priority: Critical
>         Attachments: om.abnormal.jstack, om.normal.jstack, om_rpc_callqueue_ 
> accumulation.png
>
>
> OM all rpc handlers hang when calling 
> `OzoneManagerRatisServer#submitRequestToRatis`, the key stack as follows:
> {noformat}
> "IPC Server handler 99 on 9862" #187 daemon prio=5 os_prio=0 
> tid=0x00007f1897b4c000 nid=0x10fa63 waiting on condition [0x00007f05a5b48000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00007f08a185e050> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>       at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>       at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>       at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequestToRatis(OzoneManagerRatisServer.java:285)
>       at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:247)
>       at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:217)
>       at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:198)
>       at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$696/251832800.apply(Unknown
>  Source)
>       at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>       at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147)
>       at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:886)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:828)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2716)
>    Locked ownable synchronizers:
>       - None
> {noformat}
> The complete abnormal stack see: [^om.abnormal.jstack] (also see [web 
> link|https://github.com/whbing/issue_logs/blob/main/ozone/omrpc20230323/om.abnormal.jstack])
> Compare the normal stack see:  [^om.normal.jstack] (also see [web 
> link|https://github.com/whbing/issue_logs/blob/main/ozone/omrpc20230323/om.normal.jstack])
> ipc debug log as follow:
> {noformat}
> 2023-03-22 13:17:56,135 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server: Successfully authorized userInfo {
>   effectiveUser: "xxx"
> }
> protocol: "org.apache.hadoop.hdds.protocol.GenericRefreshProtocol"
> 2023-03-22 13:17:56,135 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server:  got #0
> 2023-03-22 13:17:57,143 [IPC Server idle connection scanner for port 9862] 
> DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner for 
> port 9862: task running
> 2023-03-22 13:17:57,946 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server:  got #-4
> 2023-03-22 13:17:57,946 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server: Received ping message
> 2023-03-22 13:18:07,143 [IPC Server idle connection scanner for port 9862] 
> DEBUG org.apache.hadoop.ipc.Server: IPC Server idle connection scanner for 
> port 9862: task running
> 2023-03-22 13:18:13,536 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server:  got #-4
> 2023-03-22 13:18:13,536 [Socket Reader #1 for port 9862] DEBUG 
> org.apache.hadoop.ipc.Server: Received ping message
> {noformat}
> RPCs are backlogged in callQueue: 
>  !om_rpc_callqueue_ accumulation.png! 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to