[ 
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762439#action_12762439
 ] 

Konstantin Shvachko commented on HDFS-668:
------------------------------------------

In case of recovery from pipeline close failure the client
# requests a new GS from NN via {{NameNode.updateBlockForPipeline()}};
# sends the new GS to the remaining DNs via 
{{DataStreamer.createBlockOutputStream()}};
# notifies NN of establishing the new pipeline, which updates the block's GS to 
the new one via {{NameNode.updatePipeline()}}

During (2) DNs may send {{addBlock()}} to NN, which may cause race condition 
with notification (3) from the client.
You are right one solution is to ignore GS in look up for the replica in 
{{addBlock()}}. The best way to fix it that way is to implement HDFS-512.

Another solution would be to set the new GS to the block in (1). That is 
{{NameNode.updateBlockForPipeline()}} will have to not only return the new GS, 
but also update the under-construction block with this GS. I checked the code 
and do not see problems with this approach so far.

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is 
> the snippet of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient 
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for 
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(224)) - Receiving block 
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (FSDataset.java:recoverClose(1248)) - Recover failed close 
> blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(369)) - Received block 
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of 
> size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange 
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock: 
> addStoredBlock request received for blk_-4098350497078465335_1008 on 
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:updatePipeline(3946)) - 
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, 
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to