[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536996#comment-16536996
 ] 

Andrew Wang commented on HDFS-13658:
------------------------------------

Hi Kitti, thanks for working on this! IIUC in your patch, it calls 
updateOneReplicaBlocks in different places in BlockManager to track this 
metric. However, don't we already have this metric in LowRedundancyBlock, via 
the size of the highest priority queue? This would be an easy way of also 
handling the EC case, since it uses the highest priority queue for minimally 
durable blocks. Exposing the lengths of these different queues might be 
interesting more generically, since it would give more detailed insight into NN 
recovery activities. I'll also note that countNodes is a somewhat expensive 
function, so it's not good to be calling it frequently in the BM.

A few other comments:

* ClientProtocol#getStats is deprecated so we shouldn't be putting new fields 
there. I think getReplicatedBlockStats and getECBlockGroupStats are the correct 
replacements. Similar for the new beans, there are Replicated and EC classes, 
shouldn't go into NameNodeMXBean.
* Do we need the fsck changes? fsck already shows the number of 
under-replicated blocks, which is a very similar sign that the cluster is not 
healthy. If an admin isn't seeing the existing fsck metric, they aren't going 
to see this one either. This would save us making the protocol changes, if 
we're just exposing new NN metrics.

> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-13658
>                 URL: https://issues.apache.org/jira/browse/HDFS-13658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.1.0
>            Reporter: Kitti Nanasi
>            Assignee: Kitti Nanasi
>            Priority: Major
>         Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch, 
> HDFS-13658.006.patch, HDFS-13658.007.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to