[ 
https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766255#action_12766255
 ] 

Hairong Kuang commented on HDFS-668:
------------------------------------

Even with Konstantin's quick fix, TC7 still occasionally hangs because it is 
not able to close the file.  There is yet another bug. :-( The below logs 
helped me pinpoint the problem:

    [junit] 2009-10-14 23:15:07,359 DEBUG hdfs.StateChange 
(NameNode.java:blockReceived(948)) - *BLOCK* NameNode.blockReceived: from 
127.0.0.1:57020 1 blocks.
    [junit] 2009-10-14 23:15:07,364 DEBUG hdfs.StateChange 
(FSNamesystem.java:blockReceived(2657)) - BLOCK* NameSystem.blockReceived: 
blk_-3647824922726121286_1008 is received from 127.0.0.1:57020
    [junit] 2009-10-14 23:15:07,364 DEBUG namenode.FSNamesystem 
(DatanodeDescriptor.java:processReportedBlock(466)) - Reported block 
blk_-3647824922726121286_1008 on 127.0.0.1:57020 size 65536 replicaState = 
FINALIZED
    [junit] 2009-10-14 23:15:07,364 DEBUG namenode.FSNamesystem 
(DatanodeDescriptor.java:processReportedBlock(479)) - In memory blockUCState = 
UNDER_CONSTRUCTION
    [junit] 2009-10-14 23:15:07,365 WARN  namenode.FSNamesystem 
(BlockManager.java:addStoredBlock(1029)) - Inconsistent size for block 
blk_-3647824922726121286_1008 reported from 127.0.0.1:57020 current size is 
32768 reported size is 65536
    [junit] 2009-10-14 23:15:07,365 INFO  hdfs.StateChange 
(BlockManager.java:addStoredBlock(1098)) - BLOCK* NameSystem.addStoredBlock: 
blockMap updated: 127.0.0.1:57020 is added to BlockInfoUnderConstruction{
    [junit]   blockUCState=UNDER_CONSTRUCTION
    [junit]   replicas=[blk_-3647824922726121286_1007, 
blk_-3647824922726121286_1007]
    [junit]   primaryNodeIndex=-1} size 65536
    [junit] 2009-10-14 23:15:07,365 INFO  namenode.FSNamesystem 
(FSNamesystem.java:updatePipeline(3951)) - 
updatePipeline(block=blk_-3647824922726121286_1007, newGenerationStamp=1008, 
newLength=65536, newNodes=[127.0.0.1:57020], clientName=DFSClient_-1021096837)
    [junit] 2009-10-14 23:15:07,366 DEBUG hdfs.StateChange 
(FSDirectory.java:persistBlocks(321)) - DIR* FSDirectory.persistBlocks: 
/TC7/foo with 1 blocks is persisted to the file system
    [junit] 2009-10-14 23:15:07,369 INFO  namenode.FSNamesystem 
(FSNamesystem.java:updatePipeline(4000)) - 
updatePipeline(blk_-3647824922726121286_1007) successfully to 
blk_-3647824922726121286_1008
    [junit] 2009-10-14 23:15:07,370 DEBUG hdfs.DFSClient 
(DFSClient.java:endBlock(2613)) - Closing old block 
blk_-3647824922726121286_1008
...
    [junit] 2009-10-14 23:15:12,317 DEBUG hdfs.StateChange 
(NameNode.java:complete(642)) - *DIR* NameNode.complete: /TC7/foo for 
DFSClient_-1021096837
    [junit] 2009-10-14 23:15:12,317 DEBUG hdfs.StateChange 
(FSNamesystem.java:completeFileInternal(1318)) - DIR* NameSystem.completeFile: 
/TC7/foo for DFSClient_-1021096837
    [junit] 2009-10-14 23:15:12,318 INFO  hdfs.StateChange 
(FSNamesystem.java:checkFileProgress(1402)) - BLOCK* 
NameSystem.checkFileProgress: block BlockInfoUnderConstruction{
    [junit]   blockUCState=COMMITTED
    [junit]   replicas=[blk_-3647824922726121286_1008]
    [junit]   primaryNodeIndex=-1} has not reached minimal replication 1
    [junit] 2009-10-14 23:15:12,720 INFO  hdfs.DFSClient 
(DFSClient.java:completeFile(3634)) - Could not complete file /TC7/foo 
retrying...
 

> TestFileAppend3#TC7 sometimes hangs
> -----------------------------------
>
>                 Key: HDFS-668
>                 URL: https://issues.apache.org/jira/browse/HDFS-668
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: hdfs-668.patch
>
>
> TestFileAppend3 hangs because it fails on close the file. The following is 
> the snippet of logs that shows the cause of the problem:
>     [junit] 2009-10-01 07:00:00,719 WARN  hdfs.DFSClient 
> (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for 
> block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 
> 127.0.0.1:36982: bad datanode 127.0.0.1:36982
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(224)) - Receiving block 
> blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375
>     [junit] 2009-10-01 07:00:00,721 INFO  datanode.DataNode 
> (FSDataset.java:recoverClose(1248)) - Recover failed close 
> blk_-4098350497078465335_1007
>     [junit] 2009-10-01 07:00:00,723 INFO  datanode.DataNode 
> (DataXceiver.java:opWriteBlock(369)) - Received block 
> blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of 
> size 65536
>     [junit] 2009-10-01 07:00:00,724 INFO  hdfs.StateChange 
> (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock: 
> addStoredBlock request received for blk_-4098350497078465335_1008 on 
> 127.0.0.1:58375 size 65536 But it does not belong to any file.
>     [junit] 2009-10-01 07:00:00,724 INFO  namenode.FSNamesystem 
> (FSNamesystem.java:updatePipeline(3946)) - 
> updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, 
> newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145)
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to