Phil Yang created HDFS-9600:
-------------------------------
Summary: do not check replication if the block is under
construction
Key: HDFS-9600
URL: https://issues.apache.org/jira/browse/HDFS-9600
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Phil Yang
Assignee: Phil Yang
Priority: Critical
When appending a file, we will update pipeline to bump a new GS and the old GS
will be considered as out of date. When changing GS, in
BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having
old GS which means we will remove all replicas because no DN has new GS until
the block with new GS is added to blockMaps again by
DatanodeProtocol.blockReceivedAndDeleted.
If we check replication of this block before it is added back, it will be
regarded as missing. The probability is low but if there are decommissioning
nodes the DecommissionManager.Monitor will scan all blocks belongs to
decommissioning nodes with a very fast speed so the probability of finding
missing block is very high and actually they are not missing.
Furthermore, after closing the appended file, in
FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication and
because of some of nodes is decommissioning, this block with new GS will be
added to UnderReplicatedBlocks map so there are two blocks with same ID in this
map, one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in
QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many
missing blocks warning in NameNode website but there is no corrupt files...
Therefore, I think the solution is we should not check replication if the block
is under construction. We only check complete blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)