Chen Liang created HDFS-12043:
---------------------------------
Summary: Add counters for block re-replication
Key: HDFS-12043
URL: https://issues.apache.org/jira/browse/HDFS-12043
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Chen Liang
Assignee: Chen Liang
We occasionally see that the under-replicated block count is not going down
quickly enough. We've made at least one fix to speed up block replications
(HDFS-9205) but we need better insight into the current state and activity of
the block re-replication logic. For example, we need to understand whether is
it because re-replication is not making forward progress at all, or is it
because new under-replicated blocks are being added faster.
We should include additional metrics:
# Cumulative number of blocks that were successfully replicated.
# Cumulative number of re-replications that timed out.
# Cumulative number of blocks that were dequeued for re-replication but not
scheduled e.g. because they were invalid, or under-construction or replication
was postponed.
The growth rate of of the above metrics will make it clear whether block
replication is making forward progress and if not then provide potential clues
about why it is stalled.
Thanks [~arpitagarwal] for the offline discussions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]