Elek, Marton created HDDS-776:
---------------------------------

             Summary: Make OM initialization resilient to dns failures
                 Key: HDDS-776
                 URL: https://issues.apache.org/jira/browse/HDDS-776
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: OM
            Reporter: Elek, Marton


Ozone Manager could be initialized by 'ozone om --init' command and it connects 
to a running scm.

In case of scm is unavailable because a dns issue the initialization is failed 
without any retry:

{code}
 2018-10-31 15:36:26 ERROR OzoneManager:376 - Could not initialize OM version 
file
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: "releastest2-ozone-scm-0.releastest2-ozone-scm":9863; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768)
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)
        at org.apache.hadoop.ipc.Client.call(Client.java:1367)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy9.getScmInfo(Unknown Source)
        at 
org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:154)
        at org.apache.hadoop.ozone.om.OzoneManager.omInit(OzoneManager.java:358)
        at 
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:326)
        at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:265)
Caused by: java.net.UnknownHostException
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450)
        ... 10 more 
{code}

This is a problem for all the containerized environments. In kubernetes om 
can't be started sometimes. For docker-compose environments we have a 15 sec 
sleep to be sure to avoid this issue. 

Would be great to retry in case of a dns problem.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to