[ 
https://issues.apache.org/jira/browse/HDFS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425290#comment-13425290
 ] 

Andy Isaacson commented on HDFS-2554:
-------------------------------------

Thanks for the comment, I need to think about most of it but I have one 
immediate response, to the last paragraph:
bq. split {{CorruptBlocksRN}} into c<r and c==r
I don't buy it, because there's no fundamental difference between these two 
cases. In either case, all of the replicas the NN knows about are corrupt.  The 
block may have been underreplicated when we discovered all the existing 
replicas are corrupt (in which case r=3 c=2 but the block should still 
semantically be counted in your CriticallyCorrupt bucket); the block may have 
been overreplicated when we discovered the corruption (so r=3 c=4 is possible). 
 In all cases the administrator action is the same.

Keep in mind that the driving idea behind this change is that there are 
different recommended actions for an administrator responding to each of these 
4 categories.  Simply multiplying metrics because we are able to count them is 
not a benefit.
                
> Add separate metrics for missing blocks with desired replication level 1
> ------------------------------------------------------------------------
>
>                 Key: HDFS-2554
>                 URL: https://issues.apache.org/jira/browse/HDFS-2554
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Andy Isaacson
>            Priority: Minor
>
> Some users use replication level set to 1 for datasets which are unimportant 
> and can be lost with no worry (eg the output of terasort tests). But other 
> data on the cluster is important and should not be lost. It would be useful 
> to separate the metric for missing blocks by the desired replication level of 
> those blocks, so that one could ignore missing blocks at repl 1 while still 
> alerting on missing blocks with higher desired replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to