[
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135372#comment-15135372
]
Yongjun Zhang commented on HDFS-9600:
-------------------------------------
HI [~yangzhe1991], [~szetszwo] and [~vinayrpet],
Thanks for your earlier work here. I noticed that in branch-2.6
{code}
/**
* Return true if there are any blocks on this node that have not
* yet reached their replication factor. Otherwise returns false.
*/
boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
boolean status = false;
boolean firstReplicationLog = true;
int underReplicatedBlocks = 0;
int decommissionOnlyReplicas = 0;
int underReplicatedInOpenFiles = 0;
final Iterator<? extends Block> it = srcNode.getBlockIterator();
while(it.hasNext()) {
final Block block = it.next();
BlockCollection bc = blocksMap.getBlockCollection(block);
if (bc != null) {
NumberReplicas num = countNodes(block);
int curReplicas = num.liveReplicas();
int curExpectedReplicas = getReplication(block);
if (curReplicas < curExpectedReplicas
|| !isPlacementPolicySatisfied(block)) {
{code}
And clause {{ blockInfo.isComplete()}} is added to the following method but not
in the above method, Would any of you please explain why
{{blockInfo.isComplete()}} is not needed in the above code? Thanks a lot.
{code}
boolean isNeededReplication(Block b, int expected, int current) {
BlockInfo blockInfo;
if (b instanceof BlockInfo) {
blockInfo = (BlockInfo) b;
} else {
blockInfo = getStoredBlock(b);
}
return blockInfo.isComplete()
&& (current < expected || !isPlacementPolicySatisfied(b));
}
{code}
> do not check replication if the block is under construction
> -----------------------------------------------------------
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Phil Yang
> Assignee: Phil Yang
> Priority: Critical
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch,
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch,
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old
> GS will be considered as out of date. When changing GS, in
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having
> old GS which means we will remove all replicas because no DN has new GS until
> the block with new GS is added to blockMaps again by
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be
> regarded as missing. The probability is low but if there are decommissioning
> nodes the DecommissionManager.Monitor will scan all blocks belongs to
> decommissioning nodes with a very fast speed so the probability of finding
> missing block is very high but actually they are not missing.
> Furthermore, after closing the appended file, in
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If
> some of nodes are decommissioning, this block with new GS will be added to
> UnderReplicatedBlocks map so there are two blocks with same ID in this map,
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the
> block is under construction. We only check complete blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)