[ 
https://issues.apache.org/jira/browse/HDFS-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237396#comment-15237396
 ] 

Kihwal Lee commented on HDFS-10270:
-----------------------------------

It is not for testing the client connection being up. It is simply checking one 
of the metrics values reported in JMX. I don't know the reason why 
NumOpenConnections was chosen.  The test had worked *reliably* until the jmx 
caching was fixed. The values used to be available right away, but now it takes 
about 10 seconds. So when it's working it adds about 10 more seconds of delay. 

But the original author also made a wrong assumption.  The assumption was that 
the reason for the number of connections being 2 is due to having two 
datanodes. As you have correctly analyzed, this is not true in a 
MiniDFSCluster.  Since the two datanodes are sharing the same JVM, a single 
connection was shared for the {{DatanodeProtocol}}. An additional connection 
was made for the client. In a real distributed cluster, it would have been 3 
connections.

I lean toward fixing the existing check than removing it. First it shouldn't 
check against the number of datanods, but simply 2. Regarding increasing ipc 
client idle timeout, it will make test run time longer, which is against what 
we have been trying to do.  An alternative is to add a test resource to reduce 
the jmx update interval.  We could add a 
{{hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-metrics2.properties}}
 file with one line containing {{*.period=1}}.  This will also reduce the run 
time of a number of test cases that query jmx to verify the result.

What do you think?

> TestJMXGet:testNameNode() fails
> -------------------------------
>
>                 Key: HDFS-10270
>                 URL: https://issues.apache.org/jira/browse/HDFS-10270
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0, 2.8.0
>            Reporter: Andras Bokor
>            Assignee: Gergely Novák
>            Priority: Minor
>         Attachments: HDFS-10270.001.patch, TestJMXGet.log, TestJMXGetFails.log
>
>
> It fails with java.util.concurrent.TimeoutException. Actually the problem 
> here is that we expect 2 as NumOpenConnections metric but it is only 1. So 
> the test waits 60 sec then fails.
> Please find maven output so the stack trace attached ([^TestJMXGetFails.log]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to