[
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chen Liang updated HDFS-12043:
------------------------------
Attachment: HDFS-12043.003.patch
Thanks [~arpitagarwal] for the comments! Post v003 patch to rename the metrics
and added to {{if (pendingNum > 0)}} check.
> Add counters for block re-replication
> -------------------------------------
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Chen Liang
> Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch,
> HDFS-12043.003.patch
>
>
> We occasionally see that the under-replicated block count is not going down
> quickly enough. We've made at least one fix to speed up block replications
> (HDFS-9205) but we need better insight into the current state and activity of
> the block re-replication logic. For example, we need to understand whether is
> it because re-replication is not making forward progress at all, or is it
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated.
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not
> scheduled e.g. because they were invalid, or under-construction or
> replication was postponed.
>
> The growth rate of of the above metrics will make it clear whether block
> replication is making forward progress and if not then provide potential
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]