[ 
https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HaiBin Huang updated HDFS-14783:
--------------------------------
    Description: 
SlowPeersReport in namenode's jmx can tell us which datanode is slow node, and 
it is calculated by the average duration between two datanode sending packet. 
Here is an example, if dn1 send packet to dn2 tasks too long in average (over 
the *upperLimitLatency*), you will see SlowPeersReport in namenode's jmx like 
this :
{code:java}
"SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
{code}
However, if dn1 just sending some packet to dn2 with a slow speed in the 
beginning , then didn't send any packet to dn2 for a long time, which will keep 
the abovementioned SlowPeersReport staying on namenode's jmx . I think this 
SlowPeersReport might be an expired message, because the network between dn1 
and dn2 may have returned to normal, but the SlowPeersReport is still on 
nameonode's jmx until next time dn1 sending packet to dn2. So I use a timestamp 
to record when an *org.apache.hadoop.metrics2.util.SampleStat* is created, and 
calculate the average duration with the valid *SampleStat ,* which is judged by 
it  timestamp.

  was:
SlowPeersReport in namenode's jmx can tell us which datanode is slow node, and 
it is calculated by the average duration between two datanode sending packet. 
Here is an example, if dn1 send packet to dn2 tasks too long in average (over 
the *upperLimitLatency*), you will see SlowPeersReport in namenode's jmx like 
this :
"SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
However, if dn1 just sending some packet to dn2 with a slow speed in the 
beginning , then didn't send any packet to dn2 for a long time, which will keep 
the abovementioned SlowPeersReport staying on namenode's jmx . I think this 
SlowPeersReport might be an expired message, because the network between dn1 
and dn2 may have returned to normal, but the SlowPeersReport is still on 
nameonode's jmx until next time dn1 sending packet to dn2. So I use a timestamp 
to record when an *org.apache.hadoop.metrics2.util.SampleStat* is created, and 
calculate the average duration with the valid *SampleStat ,* which is judged by 
it  timestamp.


> expired SlowPeersReport will keep staying on namenode's jmx
> -----------------------------------------------------------
>
>                 Key: HDFS-14783
>                 URL: https://issues.apache.org/jira/browse/HDFS-14783
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: HaiBin Huang
>            Priority: Major
>
> SlowPeersReport in namenode's jmx can tell us which datanode is slow node, 
> and it is calculated by the average duration between two datanode sending 
> packet. Here is an example, if dn1 send packet to dn2 tasks too long in 
> average (over the *upperLimitLatency*), you will see SlowPeersReport in 
> namenode's jmx like this :
> {code:java}
> "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}]
> {code}
> However, if dn1 just sending some packet to dn2 with a slow speed in the 
> beginning , then didn't send any packet to dn2 for a long time, which will 
> keep the abovementioned SlowPeersReport staying on namenode's jmx . I think 
> this SlowPeersReport might be an expired message, because the network between 
> dn1 and dn2 may have returned to normal, but the SlowPeersReport is still on 
> nameonode's jmx until next time dn1 sending packet to dn2. So I use a 
> timestamp to record when an *org.apache.hadoop.metrics2.util.SampleStat* is 
> created, and calculate the average duration with the valid *SampleStat ,* 
> which is judged by it  timestamp.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to