[ 
https://issues.apache.org/jira/browse/HDFS-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628256#comment-16628256
 ] 

Ayush Saxena commented on HDFS-13927:
-------------------------------------

{noformat}
2018-09-25 06:00:28,162 [IPC Server listener on 46808] INFO  ipc.Server 
(Server.java:run(1153)) - IPC Server listener on 46808: starting
2018-09-25 06:00:28,165 [main] INFO  namenode.NameNode 
(NameNode.java:startCommonServices(815)) - NameNode RPC up at: 
localhost/127.0.0.1:46808

2018-09-25 06:00:28,251 [IPC Server listener on 41229] INFO  ipc.Server 
(Server.java:run(1153)) - IPC Server listener on 41229: starting
2018-09-25 06:00:28,254 [main] INFO  namenode.NameNode 
(NameNode.java:startCommonServices(815)) - NameNode RPC up at: 
localhost/127.0.0.1:41229
{noformat}

{noformat}
2018-09-25 06:00:28,293 [Thread-1152] WARN  datanode.DataNode 
(BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to 
server: localhost/127.0.0.1:41229
2018-09-25 06:00:28,293 [Thread-1151] WARN  datanode.DataNode 
(BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to 
server: localhost/127.0.0.1:46808
{noformat}

Analysed the failure logs somehow due to milliseconds gap the DN is not able to 
connect to namenode, even though both NNs have started.That is why it gets the 
connection failure so it sleeps for additional 5 seconds before retrying to 
connect. So DN takes 5sec+addl 5 seconds to report the failed state. 
Controlling this milliseconds gaps seems beyond our control and totally machine 
specific. We are bound to increase the timeout to handle such an unfortunate 
encounter. Will upload addendum patch by increasing time out.

> Improve TestDataNodeMultipleRegistrations#testDNWithInvalidStorageWithHA wait
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-13927
>                 URL: https://issues.apache.org/jira/browse/HDFS-13927
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Minor
>             Fix For: 3.2.0
>
>         Attachments: HDFS-13927-01.patch, HDFS-13927-02.patch
>
>
> Remove the explicit wait in the test for failed datanode with exact time 
> required for the process to confirm the status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to