[ 
https://issues.apache.org/jira/browse/HDFS-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-11194:
---------------------------------
    Attachment: HDFS-11194.04.patch

Thanks for the review [~xyao]. The v4 patch addresses your feedback. A few 
comments below:

bq. Line 212. we should instantiate slowPeerTracker only if 
dataNodePeerStatsEnabled is true.
We only instantiate it when dataNodePeerStatsEnabled is true. Let me know if I 
misunderstood your comment.

bq. Line 1653-1660: NIT: can we tweak the code to avoid calling 
slowPeers.getSlowPeers() multiple times in the worst case and maybe avoid the 
if (LOG.isDebugEnabled()) with parameterized logging?
Fixed. Can't change the logging since DataNodeManager has not been converted to 
slf4j yet.

bq. Line 1659: can we use nodeinfo.getIpcAddr() sicne the datanode has 
registered?
That overload is private. Did you mean use nodeInfo.getIpcAddr(true)? I used IP 
addresses everywhere since that's what the DataNodes report on their peers.

bq.  Great catch on some missing synchronized on rollOverAvgs.
The synchronization was fine since the callers always get the lock but it was 
triggering a findbugs false positive.

bq. We can make this an interface to allow different aggregation methods 
(median, 90th percentile) for outlier detection.
Thanks. We shouldn't need an interface here as the SlowNodeDetector is agnostic 
to which aggregation is used.

bq. we need to guard the tracing with if (LOG.isTraceEnabled()) to avoid the 
implicit sorted.toString() overhead.
I think the sorted.toString method will not be called unless the trace level is 
enabled. Here's the implementation of Logger#trace from Log4jLoggerAdapter:
{code}
public void trace(String format, Object... arguments) {
  if (isTraceEnabled()) {
    FormattingTuple ft = MessageFormatter.arrayFormat(format, arguments);
    logger.log(FQCN, traceCapable ? Level.TRACE : Level.DEBUG, ft
        .getMessage(), ft.getThrowable());
  }
}
{code}

bq.  Line 60: is the 300_000 a typo or special usage of timeout rule?
That's [for 
readability|https://docs.oracle.com/javase/8/docs/technotes/guides/language/underscores-literals.html]
 to help count zeros.

> Maintain aggregated peer performance metrics on NameNode
> --------------------------------------------------------
>
>                 Key: HDFS-11194
>                 URL: https://issues.apache.org/jira/browse/HDFS-11194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Xiaobing Zhou
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-11194-03-04.delta, HDFS-11194.01.patch, 
> HDFS-11194.02.patch, HDFS-11194.03.patch, HDFS-11194.04.patch
>
>
> The metrics collected in HDFS-10917 should be reported to and aggregated on 
> NameNode as part of heart beat messages. This will make is easy to expose it 
> through JMX to users who are interested in them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to