[ 
https://issues.apache.org/jira/browse/HDDS-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608575#comment-16608575
 ] 

Elek, Marton commented on HDDS-421:
-----------------------------------

Tested with kubernetes. All of the datanodes could be started with this patch.

> Resilient DNS resolution in datanode-service 
> ---------------------------------------------
>
>                 Key: HDDS-421
>                 URL: https://issues.apache.org/jira/browse/HDDS-421
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-421-ozone-0.2.1.001.patch
>
>
> When I start big clusters on kubernetes I got a very typical error:
> If the DNS of the scm is not yet available during the bootup of the datanode: 
> the datanode won't connect to the scm. It tries to reconnect but the dns 
> resolution is not repeated.
> The problem is in the InitDatanodeState.call(). It calls the getSCMAddresses 
> which creates the InetSocketAddress-es with using the hadoop utilities. 
> During the creation of the InetSocketAddress the hadoop utilities try to 
> resolve the address and save the result to the InetSocketAddress.
> The address could be unresolved, but the InitDatanodeState.call will start to 
> use it (connectionManager.addSCMServer) and there won't be any attempt to 
> resolve it later.
> My small proposal is to return immediately of any of the scm addresses is 
> unresolved and the main loop of the DatanodeStateMachine will try it again 
> (together with the DNS resolution part).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to