[
https://issues.apache.org/jira/browse/HDFS-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685448#comment-17685448
]
ASF GitHub Bot commented on HDFS-16853:
---------------------------------------
steveloughran opened a new pull request, #5366:
URL: https://github.com/apache/hadoop/pull/5366
### Description of PR
Extension of #5162
Trying to address the problem
1. MUST NOT submit into blocking queue while closing
2. MUST NOT call queue.put() in synchronous block.
This design doesn't quite stop (2), though it should
detect and warn if the problem surfaces.
"Possible overlap in queue shutdown and request"
No new tests;
### How was this patch tested?
### For code changes:
- [X] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> The UT TestLeaseRecovery2#testHardLeaseRecoveryAfterNameNodeRestart failed
> because HADOOP-18324
> -----------------------------------------------------------------------------------------------
>
> Key: HDFS-16853
> URL: https://issues.apache.org/jira/browse/HDFS-16853
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.5
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Blocker
> Labels: pull-request-available
>
> The UT TestLeaseRecovery2#testHardLeaseRecoveryAfterNameNodeRestart failed
> with error message: Waiting for cluster to become active. And the blocking
> jstack as bellows:
> {code:java}
> "BP-1618793397-192.168.3.4-1669198559828 heartbeating to
> localhost/127.0.0.1:54673" #260 daemon prio=5 os_prio=31 tid=0x
> 00007fc1108fa000 nid=0x19303 waiting on condition [0x0000700017884000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007430a9ec0> (a
> java.util.concurrent.SynchronousQueue$TransferQueue)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.SynchronousQueue$TransferQueue.awaitFulfill(SynchronousQueue.java:762)
> at
> java.util.concurrent.SynchronousQueue$TransferQueue.transfer(SynchronousQueue.java:695)
> at
> java.util.concurrent.SynchronousQueue.put(SynchronousQueue.java:877)
> at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1186)
> at org.apache.hadoop.ipc.Client.call(Client.java:1482)
> at org.apache.hadoop.ipc.Client.call(Client.java:1429)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
> at com.sun.proxy.$Proxy23.sendHeartbeat(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClient
> SideTranslatorPB.java:168)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:570)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:714)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:915)
> at java.lang.Thread.run(Thread.java:748) {code}
> After looking into the code and found that this bug is imported by
> HADOOP-18324. Because RpcRequestSender exited without cleaning up the
> rpcRequestQueue, then caused BPServiceActor was blocked in sending request.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]