[
https://issues.apache.org/jira/browse/HADOOP-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259567#comment-17259567
]
Felix N commented on HADOOP-16947:
----------------------------------
Hi [~huanghaibin], good work on the patch. I have one question:
- Currently your unit test only covers the case where the metrics expire and DN
stops reporting the stale metrics. Can you add the case where after the metrics
expire for a while, DN starts accumulating metrics again (real life equivalent
would be when a DN is taken down for repair and later added back to the
cluster). In this case, DN should report the updated metrics normally.
> Stale record should be remove when MutableRollingAverages generating
> aggregate data.
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-16947
> URL: https://issues.apache.org/jira/browse/HADOOP-16947
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Haibin Huang
> Assignee: Haibin Huang
> Priority: Major
> Attachments: HADOOP-16947-001.patch, HADOOP-16947-002.patch,
> HADOOP-16947-003.patch, HADOOP-16947-004.patch, HADOOP-16947-005.patch,
> HADOOP-16947-006.patch, HADOOP-16947-007.patch, HADOOP-16947-008.patch,
> HADOOP-16947-009.patch, HDFS-14783, HDFS-14783-001.patch,
> HDFS-14783-002.patch, HDFS-14783-003.patch, HDFS-14783-004.patch,
> HDFS-14783-005.patch
>
>
> SlowPeersReport is generated by the SampleStat between tow dn, so it can
> present on nn's jmx like this:
> {code:java}
> "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
> {code}
> In each period, MutableRollingAverages will do a rollOverAvgs(), it will
> generate a SumAndCount object which is based on SampleStat, and store it in a
> LinkedBlockingDeque<SumAndCount>, the deque will be used to generate
> SlowPeersReport. And the old member of deque won't be removed until the queue
> is full. However, if dn1 don't send any packet to dn2 in the last of
> 36*300_000 ms, the deque will be filled with an old member, because the
> number of last SampleStat never change.I think these old SampleStats should
> be considered as expired message and ignore them when generating a new
> SlowPeersReport.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]