[
https://issues.apache.org/jira/browse/HDFS-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628256#comment-16628256
]
Ayush Saxena commented on HDFS-13927:
-------------------------------------
{noformat}
2018-09-25 06:00:28,162 [IPC Server listener on 46808] INFO ipc.Server
(Server.java:run(1153)) - IPC Server listener on 46808: starting
2018-09-25 06:00:28,165 [main] INFO namenode.NameNode
(NameNode.java:startCommonServices(815)) - NameNode RPC up at:
localhost/127.0.0.1:46808
2018-09-25 06:00:28,251 [IPC Server listener on 41229] INFO ipc.Server
(Server.java:run(1153)) - IPC Server listener on 41229: starting
2018-09-25 06:00:28,254 [main] INFO namenode.NameNode
(NameNode.java:startCommonServices(815)) - NameNode RPC up at:
localhost/127.0.0.1:41229
{noformat}
{noformat}
2018-09-25 06:00:28,293 [Thread-1152] WARN datanode.DataNode
(BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to
server: localhost/127.0.0.1:41229
2018-09-25 06:00:28,293 [Thread-1151] WARN datanode.DataNode
(BPServiceActor.java:retrieveNamespaceInfo(235)) - Problem connecting to
server: localhost/127.0.0.1:46808
{noformat}
Analysed the failure logs somehow due to milliseconds gap the DN is not able to
connect to namenode, even though both NNs have started.That is why it gets the
connection failure so it sleeps for additional 5 seconds before retrying to
connect. So DN takes 5sec+addl 5 seconds to report the failed state.
Controlling this milliseconds gaps seems beyond our control and totally machine
specific. We are bound to increase the timeout to handle such an unfortunate
encounter. Will upload addendum patch by increasing time out.
> Improve TestDataNodeMultipleRegistrations#testDNWithInvalidStorageWithHA wait
> -----------------------------------------------------------------------------
>
> Key: HDFS-13927
> URL: https://issues.apache.org/jira/browse/HDFS-13927
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13927-01.patch, HDFS-13927-02.patch
>
>
> Remove the explicit wait in the test for failed datanode with exact time
> required for the process to confirm the status.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]