[ 
https://issues.apache.org/jira/browse/HDFS-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018295#comment-15018295
 ] 

Kihwal Lee commented on HDFS-9445:
----------------------------------

And the stack trace:

{noformat}
Java stack information for the threads listed above:
===================================================
"DataXceiver for client DFSClient_attempt_xxx [Sending block 
BP-xxxxx:blk_123_456]":
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:234)
        - waiting to lock <0x00000000d60d9930> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:537)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
        at java.lang.Thread.run(Thread.java:745)
"Thread-565":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000d55613c8> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.readLock(BPOfferService.java:105)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:166)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.checkBlock(BPOfferService.java:249)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.notifyNamenodeDeletedBlock(BPOfferService.java:255)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.notifyNamenodeDeletedBlock(DataNode.java:976)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:1891)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:485)
        - locked <0x00000000d60d9930> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:690)
        - locked <0x00000000d58b9e70> (a 
org.apache.hadoop.hdfs.server.datanode.DataNode)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3137)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:242)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3166)
        at java.lang.Thread.run(Thread.java:745)
"DataNode: heartbeating to my-nn:8020":
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.validateBlockFile(FsDatasetImpl.java:1741)
        - waiting to lock <0x00000000d60d9930> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:663)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:656)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getLength(FsDatasetImpl.java:649)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkBlock(FsDatasetImpl.java:1701)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1875)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:1931)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
        at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:858)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:672)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

> Deadlock in datanode
> --------------------
>
>                 Key: HDFS-9445
>                 URL: https://issues.apache.org/jira/browse/HDFS-9445
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Kihwal Lee
>            Priority: Blocker
>
> {noformat}
> Found one Java-level deadlock:
> =============================
> "DataXceiver for client DFSClient_attempt_xxx at /1.2.3.4:100 [Sending block 
> BP-xxxxx:blk_123_456]":
>   waiting to lock monitor 0x00007f77d0731768 (object 0x00000000d60d9930, a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl),
>   which is held by "Thread-565"
> "Thread-565":
>   waiting for ownable synchronizer 0x00000000d55613c8, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "DataNode: heartbeating to my-nn:8020"
> "DataNode: heartbeating to my-nn:8020":
>   waiting to lock monitor 0x00007f77d0731768 (object 0x00000000d60d9930, a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl),
>   which is held by "Thread-565"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to