[
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766261#action_12766261
]
Hairong Kuang commented on HDFS-668:
------------------------------------
What triggered the file close problem is that the datanode's "blockReceived"
notification reached NameNode earlier than the client's pipeline update. Each
under construction blick has two list of locations: the first is for keeping
track of locations in a pipeline and the second is a triplets for tracking of
finalized replica locations. On receiving of blockReceived, NameNode put the
location into the block's triplets location. But when it later received the
pipeline update, in order to handle the newer generation stamp, it removed the
old under construction block entity from the blocks map and constructed a new
one and added the new one back into the blocks map. However, the new block
entity reset the second location list. That's why when it was time to close the
file, NN checked the second list and complained there is no replica, so it
refuse to close the file.
> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
> Key: HDFS-668
> URL: https://issues.apache.org/jira/browse/HDFS-668
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 0.21.0
> Reporter: Hairong Kuang
> Fix For: 0.21.0
>
> Attachments: hdfs-668.patch
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is
> the snippet of logs that shows the cause of the problem:
> [junit] 2009-10-01 07:00:00,719 WARN hdfs.DFSClient
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375,
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(224)) - Receiving block
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (FSDataset.java:recoverClose(1248)) - Recover failed close
> blk_-4098350497078465335_1007
> [junit] 2009-10-01 07:00:00,723 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(369)) - Received block
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of
> size 65536
> [junit] 2009-10-01 07:00:00,724 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-4098350497078465335_1008 on
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
> [junit] 2009-10-01 07:00:00,724 INFO namenode.FSNamesystem
> (FSNamesystem.java:updatePipeline(3946)) -
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008,
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.