[ https://issues.apache.org/jira/browse/HDDS-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doroszlai, Attila updated HDDS-776: ----------------------------------- Attachment: HDDS-776.001.patch > Make OM initialization resilient to dns failures > ------------------------------------------------ > > Key: HDDS-776 > URL: https://issues.apache.org/jira/browse/HDDS-776 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: OM > Reporter: Elek, Marton > Assignee: Doroszlai, Attila > Priority: Critical > Attachments: HDDS-776.001.patch > > > Ozone Manager could be initialized by 'ozone om --init' command and it > connects to a running scm. > In case of scm is unavailable because a dns issue the initialization is > failed without any retry: > {code} > 2018-10-31 15:36:26 ERROR OzoneManager:376 - Could not initialize OM version > file > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "releastest2-ozone-scm-0.releastest2-ozone-scm":9863; > java.net.UnknownHostException; For more details see: > http://wiki.apache.org/hadoop/UnknownHost > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768) > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy9.getScmInfo(Unknown Source) > at > org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:154) > at org.apache.hadoop.ozone.om.OzoneManager.omInit(OzoneManager.java:358) > at > org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:326) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:265) > Caused by: java.net.UnknownHostException > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450) > ... 10 more > {code} > This is a problem for all the containerized environments. In kubernetes om > can't be started sometimes. For docker-compose environments we have a 15 sec > sleep to be sure to avoid this issue. > Would be great to retry in case of a dns problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org