[
https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
HaiBin Huang updated HDFS-14783:
--------------------------------
Component/s: (was: hdfs)
> expired SlowPeersReport will keep staying on namenode's jmx
> -----------------------------------------------------------
>
> Key: HDFS-14783
> URL: https://issues.apache.org/jira/browse/HDFS-14783
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: HaiBin Huang
> Priority: Major
> Attachments: HDFS-14783
>
>
> SlowPeersReport in namenode's jmx can tell us which datanode is slow node,
> and it is calculated by the average duration between two datanode sending
> packet. Here is an example, if dn1 send packet to dn2 tasks too long in
> average (over the *upperLimitLatency*), you will see SlowPeersReport in
> namenode's jmx like this :
> {code:java}
> "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
> {code}
> However, if dn1 just sending some packet to dn2 with a slow speed in the
> beginning , then didn't send any packet to dn2 for a long time, which will
> keep the abovementioned SlowPeersReport staying on namenode's jmx . I think
> this SlowPeersReport might be an expired message, because the network between
> dn1 and dn2 may have returned to normal, but the SlowPeersReport is still on
> nameonode's jmx until next time dn1 sending packet to dn2. So I use a
> timestamp to record when an *org.apache.hadoop.metrics2.util.SampleStat* is
> created, and calculate the average duration with the valid *SampleStat ,*
> which is judged by it timestamp.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]