[ https://issues.apache.org/jira/browse/HDFS-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237396#comment-15237396 ]
Kihwal Lee commented on HDFS-10270: ----------------------------------- It is not for testing the client connection being up. It is simply checking one of the metrics values reported in JMX. I don't know the reason why NumOpenConnections was chosen. The test had worked *reliably* until the jmx caching was fixed. The values used to be available right away, but now it takes about 10 seconds. So when it's working it adds about 10 more seconds of delay. But the original author also made a wrong assumption. The assumption was that the reason for the number of connections being 2 is due to having two datanodes. As you have correctly analyzed, this is not true in a MiniDFSCluster. Since the two datanodes are sharing the same JVM, a single connection was shared for the {{DatanodeProtocol}}. An additional connection was made for the client. In a real distributed cluster, it would have been 3 connections. I lean toward fixing the existing check than removing it. First it shouldn't check against the number of datanods, but simply 2. Regarding increasing ipc client idle timeout, it will make test run time longer, which is against what we have been trying to do. An alternative is to add a test resource to reduce the jmx update interval. We could add a {{hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-metrics2.properties}} file with one line containing {{*.period=1}}. This will also reduce the run time of a number of test cases that query jmx to verify the result. What do you think? > TestJMXGet:testNameNode() fails > ------------------------------- > > Key: HDFS-10270 > URL: https://issues.apache.org/jira/browse/HDFS-10270 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 3.0.0, 2.8.0 > Reporter: Andras Bokor > Assignee: Gergely Novák > Priority: Minor > Attachments: HDFS-10270.001.patch, TestJMXGet.log, TestJMXGetFails.log > > > It fails with java.util.concurrent.TimeoutException. Actually the problem > here is that we expect 2 as NumOpenConnections metric but it is only 1. So > the test waits 60 sec then fails. > Please find maven output so the stack trace attached ([^TestJMXGetFails.log]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)