[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969890#comment-14969890
 ] 

Elliott Clark commented on HDFS-9289:
-------------------------------------

We just had this something very similar happen on a prod cluster. Then the 
datanode holding the only complete block was shut off for repair.

{code}
15/10/22 06:29:32 INFO hdfs.StateChange: BLOCK* allocateBlock: 
/TESTCLUSTER-HBASE/WALs/hbase4544.test.com,16020,1444266312515/hbase4544.test.com%2C16020%2C1444266312515.default.1445520572440.
 BP-1735829752-10.210.49.21-1437433901380 
blk_1190230043_116735085{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-52d9a122-a46a-4129-ab3d-d9041de109f8:NORMAL:10.210.31.48:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-c734b72e-27de-4dd4-a46c-7ae59f6ef792:NORMAL:10.210.31.38:50010|RBW]]}
15/10/22 06:32:48 INFO namenode.FSNamesystem: 
updatePipeline(block=BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085,
 newGenerationStamp=116737586, newLength=201675125, 
newNodes=[10.210.81.33:50010, 10.210.81.45:50010, 10.210.64.29:50010], 
clientName=DFSClient_NONMAPREDUCE_1976436475_1)
15/10/22 06:32:48 INFO namenode.FSNamesystem: 
updatePipeline(BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085)
 successfully to 
BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116737586
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.64.29:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-d5f7fff9-005d-4804-a223-b6e6624d3af2:NORMAL:10.210.81.45:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED]]}
 size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.81.45:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED]]}
 size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.210.81.33:50010 is added to 
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED],
 
ReplicaUnderConstruction[[DISK]DS-4d937567-7184-40b7-a822-c7e3b5d588d4:NORMAL:10.210.81.33:50010|FINALIZED]]}
 size 201681322
15/10/22 09:37:36 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW 
replica with genstamp 116735085 does not match COMPLETE block's genstamp in 
block map 116737586
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* invalidateBlock: 
blk_1190230043_116735085(stored=blk_1190230043_116737586) on 10.210.31.38:50010
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1190230043_116735085 to 10.210.31.38:50010
15/10/22 09:37:39 INFO BlockStateChange: BLOCK* BlockManager: ask 
10.210.31.38:50010 to delete [blk_1190230043_116735085]
15/10/22 12:45:03 INFO BlockStateChange: BLOCK* ask 10.210.64.29:50010 to 
replicate blk_1190230043_116737586 to datanode(s) 10.210.64.56:50010
15/10/22 12:45:07 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.64.29:50010 by hbase4496.test.com/10.210.64.56 because client machine 
reported it
15/10/22 12:50:49 INFO BlockStateChange: BLOCK* ask 10.210.81.45:50010 to 
replicate blk_1190230043_116737586 to datanode(s) 10.210.49.49:50010
15/10/22 12:50:55 INFO BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on 
10.210.81.45:50010 by hbase4478.test.com/10.210.49.49 because client machine 
reported it
15/10/22 12:56:01 WARN blockmanagement.BlockManager: PendingReplicationMonitor 
timed out blk_1190230043_116737586
{code}

The patch will help but the issue will still be there. Is there some way to 
keep the genstamps from getting out of sync?

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>         Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to