[ https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915106#comment-13915106 ]
Jing Zhao commented on HDFS-5167: --------------------------------- The patch looks pretty good to me. Some minor on the unit tests: # Maybe we can add the following code in TestRetryCacheWithHA#testClientRetryWithFailover: {code} long hitNN0 = cluster.getNamesystem(0).getRetryCache().getMetricsForTests() .getCacheHit(); long hitNN1 = cluster.getNamesystem(1).getRetryCache().getMetricsForTests() .getCacheHit(); Assert.assertTrue("CacheHit: " + hitNN0 + ", " + hitNN1, hitNN0 + hitNN1 > 0); long updatedNN0 = cluster.getNamesystem(0).getRetryCache() .getMetricsForTests().getCacheUpdated(); long updatedNN1 = cluster.getNamesystem(1).getRetryCache() .getMetricsForTests().getCacheUpdated(); // Cache updated metrics on NN0 should be >0 since the op was process on NN0 Assert.assertTrue("CacheUpdated on NN0: " + updatedNN0, updatedNN0 > 0); // Cache updated metrics on NN0 should be >0 since NN1 applied the editlog Assert.assertTrue("CacheUpdated on NN1: " + updatedNN1, updatedNN1 > 0); {code} In this way we can also cover the HA failover scenarios. # In TestNameNodeRetryCacheMetrics, we can put SafeMode enter/leave code into trySaveNamespace method: {code} private void trySaveNamespace() throws IOException { filesystem.setSafeMode(HdfsConstants.SafeModeAction.SAFEMODE_ENTER); filesystem.saveNamespace(); filesystem.setSafeMode(HdfsConstants.SafeModeAction.SAFEMODE_LEAVE); } {code} # In TestNameNodeRetryCacheMetrics#cleanup, let's add an "if (cluster != null)" check before we shutdown the cluster. > Add metrics about the NameNode retry cache > ------------------------------------------ > > Key: HDFS-5167 > URL: https://issues.apache.org/jira/browse/HDFS-5167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode > Affects Versions: 3.0.0, 2.3.0, 2.4.0 > Reporter: Jing Zhao > Assignee: Tsuyoshi OZAWA > Priority: Minor > Attachments: HDFS-5167.1.patch, HDFS-5167.10.patch, > HDFS-5167.11.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, HDFS-5167.4.patch, > HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, HDFS-5167.7.patch, > HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch > > > It will be helpful to have metrics in NameNode about the retry cache, such as > the retry count etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)