Ming Ma created HDFS-3655:
-----------------------------
Summary: datenode recoverRbw could hang sometime
Key: HDFS-3655
URL: https://issues.apache.org/jira/browse/HDFS-3655
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Reporter: Ming Ma
Fix For: 0.22.1
This bug seems to apply to 0.22 and hadoop 2.0. I will upload the initial fix
done by my colleague Xiaobo Peng shortly ( there is some logistics issue being
worked on so that he can upload patch himself later ).
recoverRbw try to kill the old writer thread, but it took the lock (FSDataset
monitor object) which the old writer thread is waiting on ( for example the
call to data.getTmpInputStreams ).
"DataXceiver for client /10.110.3.43:40193 [Receiving block
blk_-3037542385914640638_57111747
client=DFSClient_attempt_201206021424_0001_m_000401_0]" daemon prio=10
tid=0x00007facf8111800 nid=0x6b64 in Object.wait() [0x00007facd1ddb000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1186)
■locked <0x00000007856c1200> (a org.apache.hadoop.util.Daemon)
at java.lang.Thread.join(Thread.java:1239)
at
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:158)
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.recoverRbw(FSDataset.java:1347)
■locked <0x00000007838398c0> (a
org.apache.hadoop.hdfs.server.datanode.FSDataset)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:119)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlockInternal(DataXceiver.java:391)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:327)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:405)
at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:344)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)
at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira