[jira] [Updated] (HDFS-10755) TestDecommissioningStatus BindException Failure

Eric Badger (JIRA) Fri, 12 Aug 2016 06:42:52 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Badger updated HDFS-10755:
-------------------------------
    Attachment: HDFS-10755.002.patch

Attaching patch to address the checkstyle comments. Both of the test failures 
seem unrelated and they did not fail locally when I ran them with this patch.

> TestDecommissioningStatus BindException Failure
> -----------------------------------------------
>
>                 Key: HDFS-10755
>                 URL: https://issues.apache.org/jira/browse/HDFS-10755
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>         Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch
>
>
> Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They 
> are required to come back up on the same (initially ephemeral) port that they 
> were on before being shutdown. Because of this, there is an inherent race 
> condition where another process could bind to the port while the datanode is 
> down. If this happens then we get a BindException failure. However, all of 
> the tests in TestDecommissioningStatus depend on the cluster being up and 
> running for them to run correctly. So if a test blows up the cluster, the 
> subsequent tests will also fail. Below I show the BindException failure as 
> well as the subsequent test failure that occurred.
> {noformat}
> java.net.BindException: Problem binding to [localhost:35370] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>       at sun.nio.ch.Net.bind0(Native Method)
>       at sun.nio.ch.Net.bind(Net.java:436)
>       at sun.nio.ch.Net.bind(Net.java:428)
>       at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>       at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>       at org.apache.hadoop.ipc.Server.bind(Server.java:430)
>       at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:768)
>       at org.apache.hadoop.ipc.Server.<init>(Server.java:2391)
>       at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:523)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
>       at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
> {noformat}
> {noformat}
> java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
> {noformat}
> I don't think there's any way to avoid the inherent race condition with 
> getting the same ephemeral port, but we can definitely fix the tests so that 
> it doesn't cause subsequent tests to fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-10755) TestDecommissioningStatus BindException Failure

Reply via email to