[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492489#comment-14492489
 ] 

Ming Ma commented on HDFS-7993:
-------------------------------

HDFS-7933 has improved the replica reporting in the case of missing or under 
replicated block w.r.t. decommission. It appears we can use that work to 
address the reporting of fully replicated blocks.

* Change from {{report.append(" repl=" + liveReplicas);}} to {{report.append(" 
repl=" + totalReplicas);}}
* Instead of using {{DatanodeInfo}} to find replica details, we can use 
{{NumberReplicas}} instead. However, there are two types of "stale" definitions 
in NN. One is "stale datanode" when the datanode hasn't sent heartbeat for some 
time. Another one is "stale block content" when NN hasn't received block report 
from that DN after failover; that is what 
{{NumberReplicas#replicasOnStaleNodes}} is for. If we need to count "stale 
datanode", we can add another field to {{NumberReplicas}} for that.

> Incorrect descriptions in fsck when nodes are decommissioned
> ------------------------------------------------------------
>
>                 Key: HDFS-7993
>                 URL: https://issues.apache.org/jira/browse/HDFS-7993
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Ming Ma
>            Assignee: J.Andreina
>         Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch
>
>
> When you run fsck with "-files" or "-racks", you will get something like 
> below if one of the replicas is decommissioned.
> {noformat}
> blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
> {noformat}
> That is because in NamenodeFsck, the repl count comes from live replicas 
> count; while the actual nodes come from LocatedBlock which include 
> decommissioned nodes.
> Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
> verifies LocatedBlock that includes decommissioned nodes. However, it seems 
> better to exclude the decommissioned nodes in the verification; just like how 
> fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to