[
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969890#comment-14969890
]
Elliott Clark commented on HDFS-9289:
-------------------------------------
We just had this something very similar happen on a prod cluster. Then the
datanode holding the only complete block was shut off for repair.
{code}
15/10/22 06:29:32 INFO hdfs.StateChange: BLOCK* allocateBlock:
/TESTCLUSTER-HBASE/WALs/hbase4544.test.com,16020,1444266312515/hbase4544.test.com%2C16020%2C1444266312515.default.1445520572440.
BP-1735829752-10.210.49.21-1437433901380
blk_1190230043_116735085{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-52d9a122-a46a-4129-ab3d-d9041de109f8:NORMAL:10.210.31.48:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-c734b72e-27de-4dd4-a46c-7ae59f6ef792:NORMAL:10.210.31.38:50010|RBW]]}
15/10/22 06:32:48 INFO namenode.FSNamesystem:
updatePipeline(block=BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085,
newGenerationStamp=116737586, newLength=201675125,
newNodes=[10.210.81.33:50010, 10.210.81.45:50010, 10.210.64.29:50010],
clientName=DFSClient_NONMAPREDUCE_1976436475_1)
15/10/22 06:32:48 INFO namenode.FSNamesystem:
updatePipeline(BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116735085)
successfully to
BP-1735829752-10.210.49.21-1437433901380:blk_1190230043_116737586
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 10.210.64.29:50010 is added to
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-d5f7fff9-005d-4804-a223-b6e6624d3af2:NORMAL:10.210.81.45:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED]]}
size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 10.210.81.45:50010 is added to
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-8d0a91de-8a69-4f39-816e-de3a0fa8a3aa:NORMAL:10.210.81.33:50010|RBW],
ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED]]}
size 201681322
15/10/22 06:32:50 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 10.210.81.33:50010 is added to
blk_1190230043_116737586{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[[DISK]DS-0620aef7-b6b2-4a23-950c-09373f68a815:NORMAL:10.210.64.29:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-52a0a4ba-cf64-4763-99a8-6c9bb5946879:NORMAL:10.210.81.45:50010|FINALIZED],
ReplicaUnderConstruction[[DISK]DS-4d937567-7184-40b7-a822-c7e3b5d588d4:NORMAL:10.210.81.33:50010|FINALIZED]]}
size 201681322
15/10/22 09:37:36 INFO BlockStateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on
10.210.31.38:50010 by hbase4678.test.com/10.210.31.38 because reported RBW
replica with genstamp 116735085 does not match COMPLETE block's genstamp in
block map 116737586
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* invalidateBlock:
blk_1190230043_116735085(stored=blk_1190230043_116737586) on 10.210.31.38:50010
15/10/22 09:37:36 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
blk_1190230043_116735085 to 10.210.31.38:50010
15/10/22 09:37:39 INFO BlockStateChange: BLOCK* BlockManager: ask
10.210.31.38:50010 to delete [blk_1190230043_116735085]
15/10/22 12:45:03 INFO BlockStateChange: BLOCK* ask 10.210.64.29:50010 to
replicate blk_1190230043_116737586 to datanode(s) 10.210.64.56:50010
15/10/22 12:45:07 INFO BlockStateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on
10.210.64.29:50010 by hbase4496.test.com/10.210.64.56 because client machine
reported it
15/10/22 12:50:49 INFO BlockStateChange: BLOCK* ask 10.210.81.45:50010 to
replicate blk_1190230043_116737586 to datanode(s) 10.210.49.49:50010
15/10/22 12:50:55 INFO BlockStateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_1190230043 added as corrupt on
10.210.81.45:50010 by hbase4478.test.com/10.210.49.49 because client machine
reported it
15/10/22 12:56:01 WARN blockmanagement.BlockManager: PendingReplicationMonitor
timed out blk_1190230043_116737586
{code}
The patch will help but the issue will still be there. Is there some way to
keep the genstamps from getting out of sync?
> check genStamp when complete file
> ---------------------------------
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Chang Li
> Assignee: Chang Li
> Attachments: HDFS-9289.1.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a
> pipelineUpdate, but the file complete with the old block genStamp. This
> caused the replicas of two datanodes in updated pipeline to be viewed as
> corrupte. Propose to check genstamp when commit block
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)