[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572797#comment-15572797
 ] 

Andrew Wang commented on HDFS-10999:
------------------------------------

We tried to draw an equivalence between the durability of EC and replicated 
files by looking at the # of failures to data loss. This way we have a way of 
prioritizing both types of recovery work on the NN (see the LowRedundancyBlocks 
class, nee UnderReplicatedBlocks).

I think this is kind of okay from an admin POV. In my experience, the "# under 
replicated blocks" is used as a quick check of cluster health. If it's non-zero 
or not a small number, something is off and maybe you shouldn't rolling restart 
your cluster.

Something we might want to take a harder look at is actually the 
pendingReconstructionBlocksCount. By looking at the rate of change, it tells 
you how long until your cluster is back up to full strength. However, since EC 
recovery is more expensive than replication, this metric is underspecified. The 
cost for recovery also depends on the EC policy for that block.

We should also reexamine the block recovery throttles for the same reason. It's 
still looking at the # of blocks being recovered rather than the amount of I/O.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yuanbo Liu
>              Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to