[
https://issues.apache.org/jira/browse/HDFS-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018295#comment-15018295
]
Kihwal Lee commented on HDFS-9445:
----------------------------------
And the stack trace:
{noformat}
Java stack information for the threads listed above:
===================================================
"DataXceiver for client DFSClient_attempt_xxx [Sending block
BP-xxxxx:blk_123_456]":
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:234)
- waiting to lock <0x00000000d60d9930> (a
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
"Thread-565":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d55613c8> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.readLock(BPOfferService.java:105)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:166)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.checkBlock(BPOfferService.java:249)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.notifyNamenodeDeletedBlock(BPOfferService.java:255)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.notifyNamenodeDeletedBlock(DataNode.java:976)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1891)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:485)
- locked <0x00000000d60d9930> (a
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:690)
- locked <0x00000000d58b9e70> (a
org.apache.hadoop.hdfs.server.datanode.DataNode)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3137)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:242)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3166)
at java.lang.Thread.run(Thread.java:745)
"DataNode: heartbeating to my-nn:8020":
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1741)
- waiting to lock <0x00000000d60d9930> (a
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:663)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:656)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getLength(FsDatasetImpl.java:649)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkBlock(FsDatasetImpl.java:1701)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1875)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:1931)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:858)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:672)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
at java.lang.Thread.run(Thread.java:745)
{noformat}
> Deadlock in datanode
> --------------------
>
> Key: HDFS-9445
> URL: https://issues.apache.org/jira/browse/HDFS-9445
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.2
> Reporter: Kihwal Lee
> Priority: Blocker
>
> {noformat}
> Found one Java-level deadlock:
> =============================
> "DataXceiver for client DFSClient_attempt_xxx at /1.2.3.4:100 [Sending block
> BP-xxxxx:blk_123_456]":
> waiting to lock monitor 0x00007f77d0731768 (object 0x00000000d60d9930, a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl),
> which is held by "Thread-565"
> "Thread-565":
> waiting for ownable synchronizer 0x00000000d55613c8, (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
> which is held by "DataNode: heartbeating to my-nn:8020"
> "DataNode: heartbeating to my-nn:8020":
> waiting to lock monitor 0x00007f77d0731768 (object 0x00000000d60d9930, a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl),
> which is held by "Thread-565"
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)