[ 
https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796663#comment-15796663
 ] 

Xiaoyu Yao commented on HDFS-11194:
-----------------------------------

Thanks [~arpitagarwal] for working on this and all for the discussion. I have 
the following comments on the production side changes. Still reviewing the unit 
test changes and will post my comments on that soon.

1. BlockReceiver.java
NIT: Line 848: "&& mirrorAddr != null" can be removed.
Line 849: can be simplified with "peerMetrics.addSendPacketDownstream"

2. BPServiceActor.java
Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the 
parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport().

3. DatanodeManager.java
Line 52-53: NIT: avoid import *
import org.apache.hadoop.util.*;
import org.apache.hadoop.util.Timer; 

Line 180. The comments seems incomplete.

Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled
is true.

Line 1653-1660: NIT: can we tweak the code to avoid calling 
slowPeers.getSlowPeers()
multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled()) 
with
parameterized logging?

Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode
has registered? 

4. DataNodePeerMetrics.java
Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map 
in Line 137.

5. DatanodeProtocol.proto
Line 398-405. This is a very good document. Can we add a field indicating the 
DN aggregate mechanism? This way the NN can enforce consistent aggregation 
across all the datanodes. This can be done in a separate ticket. 

6. DFSConfigKeys.java
Line 677: document for dfs.datanode.slow.peers.report.interval? We can open 
separate ticket for it.

7. RollingAverage.java
Great catch on some missing synchronized on rollOverAvgs. 
NIT: Line 264: missing @param for minSamples


8. SlowNodeDetector.java 
Line 99-108: We can make this an interface to allow different aggregation 
methods 
(median, 90th percentile) for outlier detection. This can be done in a separate 
ticket.
We can also use Median/Percentile class from apache common to implement
different aggregation.

Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to 
avoid the implicit sorted.toString() overhead.

9. SlowPeerReports.java
Line 44: NIT: typo consistenly -> consistently
Line 144: NIT: the document needs to update to match the code which returns
a map -> sortedset of string.
Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in 
a separate ticket.


> Maintain aggregated peer performance metrics on NameNode
> --------------------------------------------------------
>
>                 Key: HDFS-11194
>                 URL: https://issues.apache.org/jira/browse/HDFS-11194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Xiaobing Zhou
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-11194.01.patch, HDFS-11194.02.patch, 
> HDFS-11194.03.patch
>
>
> The metrics collected in HDFS-10917 should be reported to and aggregated on 
> NameNode as part of heart beat messages. This will make is easy to expose it 
> through JMX to users who are interested in them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to