[ https://issues.apache.org/jira/browse/HDFS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Akira AJISAKA resolved HDFS-3655. --------------------------------- Resolution: Duplicate Assignee: (was: Xiaobo Peng) Target Version/s: (was: 0.22.1) Closing this issue as duplicate. Please feel free to reopen if you disagree. > Datanode recoverRbw could hang sometime > --------------------------------------- > > Key: HDFS-3655 > URL: https://issues.apache.org/jira/browse/HDFS-3655 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 0.22.0, 1.0.3, 2.0.0-alpha > Reporter: Ming Ma > Attachments: HDFS-3655-0.22-use-join-instead-of-wait.patch, > HDFS-3655-0.22.patch > > > This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial fix > done by my colleague Xiaobo Peng shortly ( there is some logistics issue > being worked on so that he can upload patch himself later ). > recoverRbw try to kill the old writer thread, but it took the lock (FSDataset > monitor object) which the old writer thread is waiting on ( for example the > call to data.getTmpInputStreams ). > "DataXceiver for client /10.110.3.43:40193 [Receiving block > blk_-3037542385914640638_57111747 > client=DFSClient_attempt_201206021424_0001_m_000401_0]" daemon prio=10 > tid=0x00007facf8111800 nid=0x6b64 in Object.wait() [0x00007facd1ddb000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1186) > ■locked <0x00000007856c1200> (a org.apache.hadoop.util.Daemon) > at java.lang.Thread.join(Thread.java:1239) > at > org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:158) > at > org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.java:1347) > ■locked <0x00000007838398c0> (a > org.apache.hadoop.hdfs.server.datanode.FSDataset) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:119) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInternal(DataXceiver.java:391) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:327) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:405) > at > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:344) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183) > at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.2#6252)