[
https://issues.apache.org/jira/browse/HDDS-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608575#comment-16608575
]
Elek, Marton commented on HDDS-421:
-----------------------------------
Tested with kubernetes. All of the datanodes could be started with this patch.
> Resilient DNS resolution in datanode-service
> ---------------------------------------------
>
> Key: HDDS-421
> URL: https://issues.apache.org/jira/browse/HDDS-421
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Elek, Marton
> Assignee: Elek, Marton
> Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-421-ozone-0.2.1.001.patch
>
>
> When I start big clusters on kubernetes I got a very typical error:
> If the DNS of the scm is not yet available during the bootup of the datanode:
> the datanode won't connect to the scm. It tries to reconnect but the dns
> resolution is not repeated.
> The problem is in the InitDatanodeState.call(). It calls the getSCMAddresses
> which creates the InetSocketAddress-es with using the hadoop utilities.
> During the creation of the InetSocketAddress the hadoop utilities try to
> resolve the address and save the result to the InetSocketAddress.
> The address could be unresolved, but the InitDatanodeState.call will start to
> use it (connectionManager.addSCMServer) and there won't be any attempt to
> resolve it later.
> My small proposal is to return immediately of any of the scm addresses is
> unresolved and the main loop of the DatanodeStateMachine will try it again
> (together with the DNS resolution part).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]