[
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762439#action_12762439
]
Konstantin Shvachko commented on HDFS-668:
------------------------------------------
In case of recovery from pipeline close failure the client
# requests a new GS from NN via {{NameNode.updateBlockForPipeline()}};
# sends the new GS to the remaining DNs via
{{DataStreamer.createBlockOutputStream()}};
# notifies NN of establishing the new pipeline, which updates the block's GS to
the new one via {{NameNode.updatePipeline()}}
During (2) DNs may send {{addBlock()}} to NN, which may cause race condition
with notification (3) from the client.
You are right one solution is to ignore GS in look up for the replica in
{{addBlock()}}. The best way to fix it that way is to implement HDFS-512.
Another solution would be to set the new GS to the block in (1). That is
{{NameNode.updateBlockForPipeline()}} will have to not only return the new GS,
but also update the under-construction block with this GS. I checked the code
and do not see problems with this approach so far.
> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
> Key: HDFS-668
> URL: https://issues.apache.org/jira/browse/HDFS-668
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 0.21.0
> Reporter: Hairong Kuang
> Fix For: 0.21.0
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is
> the snippet of logs that shows the cause of the problem:
> [junit] 2009-10-01 07:00:00,719 WARN hdfs.DFSClient
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375,
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(224)) - Receiving block
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (FSDataset.java:recoverClose(1248)) - Recover failed close
> blk_-4098350497078465335_1007
> [junit] 2009-10-01 07:00:00,723 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(369)) - Received block
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of
> size 65536
> [junit] 2009-10-01 07:00:00,724 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-4098350497078465335_1008 on
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
> [junit] 2009-10-01 07:00:00,724 INFO namenode.FSNamesystem
> (FSNamesystem.java:updatePipeline(3946)) -
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008,
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.