[ 
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766261#action_12766261
 ] 

Hairong Kuang commented on HDFS-668:
------------------------------------

What triggered the file close problem is that the datanode's "blockReceived" 
notification reached NameNode earlier than the client's pipeline update. Each 
under construction blick has two list of locations: the first is for keeping 
track of locations in a pipeline and the second is a triplets for tracking of 
finalized replica locations. On receiving of blockReceived, NameNode put the 
location into the block's triplets location. But when it later received the 
pipeline update, in order to handle the newer generation stamp, it removed the 
old under construction block entity  from the blocks map and constructed a new 
one and added the new one back into the blocks map. However, the new block 
entity reset the second location list. That's why when it was time to close the 
file, NN checked the second list and complained there is no replica, so it 
refuse to close the file. 

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hdfs-668.patch
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is 
> the snippet of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient 
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for 
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(224)) - Receiving block 
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (FSDataset.java:recoverClose(1248)) - Recover failed close 
> blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(369)) - Received block 
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of 
> size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange 
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock: 
> addStoredBlock request received for blk_-4098350497078465335_1008 on 
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:updatePipeline(3946)) - 
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, 
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to