[ https://issues.apache.org/jira/browse/HDDS-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Elek, Marton updated HDDS-421: ------------------------------ Status: Patch Available (was: Open) > Resilient DNS resolution in datanode-service > --------------------------------------------- > > Key: HDDS-421 > URL: https://issues.apache.org/jira/browse/HDDS-421 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Reporter: Elek, Marton > Assignee: Elek, Marton > Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-421-ozone-0.2.1.001.patch > > > When I start big clusters on kubernetes I got a very typical error: > If the DNS of the scm is not yet available during the bootup of the datanode: > the datanode won't connect to the scm. It tries to reconnect but the dns > resolution is not repeated. > The problem is in the InitDatanodeState.call(). It calls the getSCMAddresses > which creates the InetSocketAddress-es with using the hadoop utilities. > During the creation of the InetSocketAddress the hadoop utilities try to > resolve the address and save the result to the InetSocketAddress. > The address could be unresolved, but the InitDatanodeState.call will start to > use it (connectionManager.addSCMServer) and there won't be any attempt to > resolve it later. > My small proposal is to return immediately of any of the scm addresses is > unresolved and the main loop of the DatanodeStateMachine will try it again > (together with the DNS resolution part). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org