[ https://issues.apache.org/jira/browse/HDFS-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628256#comment-16628256 ]
Ayush Saxena commented on HDFS-13927: ------------------------------------- {noformat} 2018-09-25 06:00:28,162 [IPC Server listener on 46808] INFO ipc.Server (Server.java:run(1153)) - IPC Server listener on 46808: starting 2018-09-25 06:00:28,165 [main] INFO namenode.NameNode (NameNode.java:startCommonServices(815)) - NameNode RPC up at: localhost/127.0.0.1:46808 2018-09-25 06:00:28,251 [IPC Server listener on 41229] INFO ipc.Server (Server.java:run(1153)) - IPC Server listener on 41229: starting 2018-09-25 06:00:28,254 [main] INFO namenode.NameNode (NameNode.java:startCommonServices(815)) - NameNode RPC up at: localhost/127.0.0.1:41229 {noformat} {noformat} 2018-09-25 06:00:28,293 [Thread-1152] WARN datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to server: localhost/127.0.0.1:41229 2018-09-25 06:00:28,293 [Thread-1151] WARN datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to server: localhost/127.0.0.1:46808 {noformat} Analysed the failure logs somehow due to milliseconds gap the DN is not able to connect to namenode, even though both NNs have started.That is why it gets the connection failure so it sleeps for additional 5 seconds before retrying to connect. So DN takes 5sec+addl 5 seconds to report the failed state. Controlling this milliseconds gaps seems beyond our control and totally machine specific. We are bound to increase the timeout to handle such an unfortunate encounter. Will upload addendum patch by increasing time out. > Improve TestDataNodeMultipleRegistrations#testDNWithInvalidStorageWithHA wait > ----------------------------------------------------------------------------- > > Key: HDFS-13927 > URL: https://issues.apache.org/jira/browse/HDFS-13927 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ayush Saxena > Assignee: Ayush Saxena > Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-13927-01.patch, HDFS-13927-02.patch > > > Remove the explicit wait in the test for failed datanode with exact time > required for the process to confirm the status. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org