[ 
https://issues.apache.org/jira/browse/HDFS-15754?focusedWorklogId=529308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-529308
 ]

ASF GitHub Bot logged work on HDFS-15754:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Dec/20 23:12
            Start Date: 29/Dec/20 23:12
    Worklog Time Spent: 10m 
      Work Description: sunchao commented on a change in pull request #2578:
URL: https://github.com/apache/hadoop/pull/2578#discussion_r549883080



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
##########
@@ -183,6 +183,11 @@
   @Metric private MutableRate checkAndUpdateOp;
   @Metric private MutableRate updateReplicaUnderRecoveryOp;
 
+  @Metric MutableCounterLong totalPacketsReceived;

Review comment:
       We'll need to add these new metrics to 
[here](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Metrics.html#datanode)
 right?

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
##########
@@ -161,6 +163,65 @@ public void testReceivePacketMetrics() throws Exception {
     }
   }
 
+  @Test
+  public void testReceivePacketSlowMetrics() throws Exception {
+    Configuration conf = new HdfsConfiguration();
+    final int interval = 1;
+    conf.set(DFSConfigKeys.DFS_METRICS_PERCENTILES_INTERVALS_KEY, "" + 
interval);
+    MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
+        .numDataNodes(3).build();
+    try {
+      cluster.waitActive();
+      DistributedFileSystem fs = cluster.getFileSystem();
+      final DataNodeFaultInjector injector =
+          Mockito.mock(DataNodeFaultInjector.class);
+      Mockito.doAnswer(new Answer() {
+        @Override
+        public Object answer(InvocationOnMock invocationOnMock)
+            throws Throwable {
+          // make the op taking longer time
+          Thread.sleep(1000);
+          return null;
+        }
+      }).when(injector).stopSendingPacketDownstream(Mockito.anyString());
+      Mockito.doAnswer(new Answer() {
+        @Override
+        public Object answer(InvocationOnMock invocationOnMock)
+            throws Throwable {
+          // make the op taking longer time
+          Thread.sleep(1000);
+          return null;
+        }
+      }).when(injector).delayWriteToOsCache();
+      Mockito.doAnswer(new Answer() {
+        @Override
+        public Object answer(InvocationOnMock invocationOnMock)
+            throws Throwable {
+          // make the op taking longer time
+          Thread.sleep(1000);
+          return null;
+        }
+      }).when(injector).delayWriteToDisk();
+      DataNodeFaultInjector.set(injector);
+      Path testFile = new Path("/testFlushNanosMetric.txt");
+      FSDataOutputStream fout = fs.create(testFile);
+      fout.write(new byte[1]);
+      fout.hsync();
+      fout.close();
+      List<DataNode> datanodes = cluster.getDataNodes();
+      DataNode datanode = datanodes.get(0);
+      MetricsRecordBuilder dnMetrics = 
getMetrics(datanode.getMetrics().name());
+      assertTrue("More than 1 packet received",
+          getLongCounter("TotalPacketsReceived", dnMetrics) > 1L);
+      assertTrue("More than 1 slow packet to mirror",
+          getLongCounter("TotalPacketsSlowWriteToMirror", dnMetrics) > 1L);
+      assertCounter("TotalPacketsSlowWriteToDisk", 1L, dnMetrics);
+      assertCounter("TotalPacketsSlowWriteOsCache", 0L, dnMetrics);
+    } finally {
+      if (cluster != null) {cluster.shutdown();}

Review comment:
       nit: code style




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 529308)
    Time Spent: 20m  (was: 10m)

> Create packet metrics for DataNode
> ----------------------------------
>
>                 Key: HDFS-15754
>                 URL: https://issues.apache.org/jira/browse/HDFS-15754
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Fengnan Li
>            Assignee: Fengnan Li
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In BlockReceiver, right now when there is slowness in writeToMirror, 
> writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we 
> have found these are quite useful signal to detect issues in DataNode, so it 
> will be great these metrics can be exposed by JMX.
> Also we introduced totalPacket received to use a percentage as a signal to 
> detect the potentially underperforming datanode since datanodes across one 
> HDFS cluster may received different numbers of packets totally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to