[
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766255#action_12766255
]
Hairong Kuang commented on HDFS-668:
------------------------------------
Even with Konstantin's quick fix, TC7 still occasionally hangs because it is
not able to close the file. There is yet another bug. :-( The below logs
helped me pinpoint the problem:
[junit] 2009-10-14 23:15:07,359 DEBUG hdfs.StateChange
(NameNode.java:blockReceived(948)) - *BLOCK* NameNode.blockReceived: from
127.0.0.1:57020 1 blocks.
[junit] 2009-10-14 23:15:07,364 DEBUG hdfs.StateChange
(FSNamesystem.java:blockReceived(2657)) - BLOCK* NameSystem.blockReceived:
blk_-3647824922726121286_1008 is received from 127.0.0.1:57020
[junit] 2009-10-14 23:15:07,364 DEBUG namenode.FSNamesystem
(DatanodeDescriptor.java:processReportedBlock(466)) - Reported block
blk_-3647824922726121286_1008 on 127.0.0.1:57020 size 65536 replicaState =
FINALIZED
[junit] 2009-10-14 23:15:07,364 DEBUG namenode.FSNamesystem
(DatanodeDescriptor.java:processReportedBlock(479)) - In memory blockUCState =
UNDER_CONSTRUCTION
[junit] 2009-10-14 23:15:07,365 WARN namenode.FSNamesystem
(BlockManager.java:addStoredBlock(1029)) - Inconsistent size for block
blk_-3647824922726121286_1008 reported from 127.0.0.1:57020 current size is
32768 reported size is 65536
[junit] 2009-10-14 23:15:07,365 INFO hdfs.StateChange
(BlockManager.java:addStoredBlock(1098)) - BLOCK* NameSystem.addStoredBlock:
blockMap updated: 127.0.0.1:57020 is added to BlockInfoUnderConstruction{
[junit] blockUCState=UNDER_CONSTRUCTION
[junit] replicas=[blk_-3647824922726121286_1007,
blk_-3647824922726121286_1007]
[junit] primaryNodeIndex=-1} size 65536
[junit] 2009-10-14 23:15:07,365 INFO namenode.FSNamesystem
(FSNamesystem.java:updatePipeline(3951)) -
updatePipeline(block=blk_-3647824922726121286_1007, newGenerationStamp=1008,
newLength=65536, newNodes=[127.0.0.1:57020], clientName=DFSClient_-1021096837)
[junit] 2009-10-14 23:15:07,366 DEBUG hdfs.StateChange
(FSDirectory.java:persistBlocks(321)) - DIR* FSDirectory.persistBlocks:
/TC7/foo with 1 blocks is persisted to the file system
[junit] 2009-10-14 23:15:07,369 INFO namenode.FSNamesystem
(FSNamesystem.java:updatePipeline(4000)) -
updatePipeline(blk_-3647824922726121286_1007) successfully to
blk_-3647824922726121286_1008
[junit] 2009-10-14 23:15:07,370 DEBUG hdfs.DFSClient
(DFSClient.java:endBlock(2613)) - Closing old block
blk_-3647824922726121286_1008
...
[junit] 2009-10-14 23:15:12,317 DEBUG hdfs.StateChange
(NameNode.java:complete(642)) - *DIR* NameNode.complete: /TC7/foo for
DFSClient_-1021096837
[junit] 2009-10-14 23:15:12,317 DEBUG hdfs.StateChange
(FSNamesystem.java:completeFileInternal(1318)) - DIR* NameSystem.completeFile:
/TC7/foo for DFSClient_-1021096837
[junit] 2009-10-14 23:15:12,318 INFO hdfs.StateChange
(FSNamesystem.java:checkFileProgress(1402)) - BLOCK*
NameSystem.checkFileProgress: block BlockInfoUnderConstruction{
[junit] blockUCState=COMMITTED
[junit] replicas=[blk_-3647824922726121286_1008]
[junit] primaryNodeIndex=-1} has not reached minimal replication 1
[junit] 2009-10-14 23:15:12,720 INFO hdfs.DFSClient
(DFSClient.java:completeFile(3634)) - Could not complete file /TC7/foo
retrying...
> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
> Key: HDFS-668
> URL: https://issues.apache.org/jira/browse/HDFS-668
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 0.21.0
> Reporter: Hairong Kuang
> Fix For: 0.21.0
>
> Attachments: hdfs-668.patch
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is
> the snippet of logs that shows the cause of the problem:
> [junit] 2009-10-01 07:00:00,719 WARN hdfs.DFSClient
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375,
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(224)) - Receiving block
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
> [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode
> (FSDataset.java:recoverClose(1248)) - Recover failed close
> blk_-4098350497078465335_1007
> [junit] 2009-10-01 07:00:00,723 INFO datanode.DataNode
> (DataXceiver.java:opWriteBlock(369)) - Received block
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of
> size 65536
> [junit] 2009-10-01 07:00:00,724 INFO hdfs.StateChange
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-4098350497078465335_1008 on
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
> [junit] 2009-10-01 07:00:00,724 INFO namenode.FSNamesystem
> (FSNamesystem.java:updatePipeline(3946)) -
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008,
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.