[
https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796663#comment-15796663
]
Xiaoyu Yao commented on HDFS-11194:
-----------------------------------
Thanks [~arpitagarwal] for working on this and all for the discussion. I have
the following comments on the production side changes. Still reviewing the unit
test changes and will post my comments on that soon.
1. BlockReceiver.java
NIT: Line 848: "&& mirrorAddr != null" can be removed.
Line 849: can be simplified with "peerMetrics.addSendPacketDownstream"
2. BPServiceActor.java
Line 1146: NIT: heatbeatTime can be changed to slowPeerReportTime or remove the
parameter by hiding the montonicNow() call inside scheduleNextSlowPeerReport().
3. DatanodeManager.java
Line 52-53: NIT: avoid import *
import org.apache.hadoop.util.*;
import org.apache.hadoop.util.Timer;
Line 180. The comments seems incomplete.
Line 212. we should instantiate slowPeerTracker only if dataNodePeerStatsEnabled
is true.
Line 1653-1660: NIT: can we tweak the code to avoid calling
slowPeers.getSlowPeers()
multiple times in the worst case and maybe avoid the if (LOG.isDebugEnabled())
with
parameterized logging?
Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode
has registered?
4. DataNodePeerMetrics.java
Line 142-143: Correct me if I'm wrong, looks like the comments is for stats Map
in Line 137.
5. DatanodeProtocol.proto
Line 398-405. This is a very good document. Can we add a field indicating the
DN aggregate mechanism? This way the NN can enforce consistent aggregation
across all the datanodes. This can be done in a separate ticket.
6. DFSConfigKeys.java
Line 677: document for dfs.datanode.slow.peers.report.interval? We can open
separate ticket for it.
7. RollingAverage.java
Great catch on some missing synchronized on rollOverAvgs.
NIT: Line 264: missing @param for minSamples
8. SlowNodeDetector.java
Line 99-108: We can make this an interface to allow different aggregation
methods
(median, 90th percentile) for outlier detection. This can be done in a separate
ticket.
We can also use Median/Percentile class from apache common to implement
different aggregation.
Line 127: we need to guard the tracing with if (LOG.isTraceEnabled()) to
avoid the implicit sorted.toString() overhead.
9. SlowPeerReports.java
Line 44: NIT: typo consistenly -> consistently
Line 144: NIT: the document needs to update to match the code which returns
a map -> sortedset of string.
Line 190: Can we make MAX_NODES_TO_REPORT configurable? This can be fixed in
a separate ticket.
> Maintain aggregated peer performance metrics on NameNode
> --------------------------------------------------------
>
> Key: HDFS-11194
> URL: https://issues.apache.org/jira/browse/HDFS-11194
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0
> Reporter: Xiaobing Zhou
> Assignee: Arpit Agarwal
> Attachments: HDFS-11194.01.patch, HDFS-11194.02.patch,
> HDFS-11194.03.patch
>
>
> The metrics collected in HDFS-10917 should be reported to and aggregated on
> NameNode as part of heart beat messages. This will make is easy to expose it
> through JMX to users who are interested in them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]