[
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787209#comment-16787209
]
Arpit Agarwal commented on HDDS-999:
------------------------------------
I think this DNS retry issue is fixed. But there is no delay between the
retries:
{code}
$ ozone sh volume list
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 1 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 2 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 3 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 4 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 5 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 6 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 7 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 8 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 INFO RetryInvocationHandler:411 -
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid
host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost, while invoking
$Proxy14.submitRequest over null(om:9862) after 9 failover attempts. Trying to
failover immediately.
2019-03-07 11:10:39 ERROR OMFailoverProxyProvider:235 - Failed to connect to
OM. Attempted 10 retries and 10 failovers
2019-03-07 11:10:39 ERROR OzoneClientFactory:294 - Couldn't create protocol
class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
at
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
at
org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111)
at
org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.call(ListVolumeHandler.java:77)
at
org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.call(ListVolumeHandler.java:42)
at picocli.CommandLine.execute(CommandLine.java:919)
at picocli.CommandLine.access$700(CommandLine.java:104)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
at
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:84)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:95)
Caused by: java.net.UnknownHostException: Invalid host name: local host is:
(unknown); destination host is: "om":9862; java.net.UnknownHostException; For
more details see: http://wiki.apache.org/hadoop/UnknownHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768)
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:305)
at
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceList(OzoneManagerProtocolClientSideTranslatorPB.java:1118)
at
org.apache.hadoop.ozone.client.rpc.RpcClient.getScmAddressForClient(RpcClient.java:211)
at
org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:145)
... 20 more
Caused by: java.net.UnknownHostException
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450)
... 46 more
Invalid host name: local host is: (unknown); destination host is: "om":9862;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost
{code}
> Make the DNS resolution in OzoneManager more resilient
> ------------------------------------------------------
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Elek, Marton
> Assignee: Siddharth Wagle
> Priority: Major
>
> If the OzoneManager is started before scm the scm dns may not be available.
> In this case the om should retry and re-resolve the dns, but as of now it
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket
> exception: java.net.SocketException: Unresolved address; For more details
> see: http://wiki.apache.org/hadoop/SocketException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
> at org.apache.hadoop.ipc.Server.bind(Server.java:566)
> at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1042)
> at org.apache.hadoop.ipc.Server.<init>(Server.java:2815)
> at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:994)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:421)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
> at
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
> at
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
> at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:265)
> at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
> at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
> at sun.nio.ch.Net.translateToSocketException(Net.java:131)
> at sun.nio.ch.Net.translateException(Net.java:157)
> at sun.nio.ch.Net.translateException(Net.java:163)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
> at org.apache.hadoop.ipc.Server.bind(Server.java:549)
> ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
> at sun.nio.ch.Net.checkAddress(Net.java:101)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in
> datanode side and HDDS-907 which is the workaround while this issue is not
> resolved).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]