Eric Badger created HDFS-10755:
----------------------------------
Summary: TestDecommissioningStatus BindException Failure
Key: HDFS-10755
URL: https://issues.apache.org/jira/browse/HDFS-10755
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger
Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They
are required to come back up on the same (initially ephemeral) port that they
were on before being shutdown. Because of this, there is an inherent race
condition where another process could bind to the port while the datanode is
down. If this happens then we get a BindException failure. However, all of the
tests in TestDecommissioningStatus depend on the cluster being up and running
for them to run correctly. So if a test blows up the cluster, the subsequent
tests will also fail. Below I show the BindException failure as well as the
subsequent test failure that occurred.
{noformat}
java.net.BindException: Problem binding to [localhost:35370]
java.net.BindException: Address already in use; For more details see:
http://wiki.apache.org/hadoop/BindException
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.Net.bind(Net.java:428)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:430)
at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:768)
at org.apache.hadoop.ipc.Server.<init>(Server.java:2391)
at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:523)
at
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
at
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
at
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
{noformat}
{noformat}
java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
{noformat}
I don't think there's any way to avoid the inherent race condition with getting
the same ephemeral port, but we can definitely fix the tests so that it doesn't
cause subsequent tests to fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]