[
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Badger updated HDFS-10755:
-------------------------------
Attachment: HDFS-10755.002.patch
Attaching patch to address the checkstyle comments. Both of the test failures
seem unrelated and they did not fail locally when I ran them with this patch.
> TestDecommissioningStatus BindException Failure
> -----------------------------------------------
>
> Key: HDFS-10755
> URL: https://issues.apache.org/jira/browse/HDFS-10755
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Eric Badger
> Assignee: Eric Badger
> Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch
>
>
> Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They
> are required to come back up on the same (initially ephemeral) port that they
> were on before being shutdown. Because of this, there is an inherent race
> condition where another process could bind to the port while the datanode is
> down. If this happens then we get a BindException failure. However, all of
> the tests in TestDecommissioningStatus depend on the cluster being up and
> running for them to run correctly. So if a test blows up the cluster, the
> subsequent tests will also fail. Below I show the BindException failure as
> well as the subsequent test failure that occurred.
> {noformat}
> java.net.BindException: Problem binding to [localhost:35370]
> java.net.BindException: Address already in use; For more details see:
> http://wiki.apache.org/hadoop/BindException
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:436)
> at sun.nio.ch.Net.bind(Net.java:428)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at org.apache.hadoop.ipc.Server.bind(Server.java:430)
> at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:768)
> at org.apache.hadoop.ipc.Server.<init>(Server.java:2391)
> at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:523)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
> at
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
> at
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
> {noformat}
> {noformat}
> java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
> {noformat}
> I don't think there's any way to avoid the inherent race condition with
> getting the same ephemeral port, but we can definitely fix the tests so that
> it doesn't cause subsequent tests to fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]