[ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787209#comment-16787209
 ] 

Arpit Agarwal commented on HDDS-999:
------------------------------------

I think this DNS retry issue is fixed. But there is no delay between the 
retries:

{code}
$ ozone sh volume list

2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 1 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 2 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 3 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 4 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 5 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 6 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 7 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 8 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy14.submitRequest over null(om:9862) after 9 failover attempts. Trying to 
failover immediately.
2019-03-07 11:10:39 ERROR OMFailoverProxyProvider:235 - Failed to connect to 
OM. Attempted 10 retries and 10 failovers
2019-03-07 11:10:39 ERROR OzoneClientFactory:294 - Couldn't create protocol 
class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
        at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
        at 
org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111)
        at 
org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.call(ListVolumeHandler.java:77)
        at 
org.apache.hadoop.ozone.web.ozShell.volume.ListVolumeHandler.call(ListVolumeHandler.java:42)
        at picocli.CommandLine.execute(CommandLine.java:919)
        at picocli.CommandLine.access$700(CommandLine.java:104)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
        at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
        at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:84)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
        at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:95)
Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
(unknown); destination host is: "om":9862; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:768)
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:449)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)
        at org.apache.hadoop.ipc.Client.call(Client.java:1367)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
        at com.sun.proxy.$Proxy14.submitRequest(Unknown Source)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:305)
        at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceList(OzoneManagerProtocolClientSideTranslatorPB.java:1118)
        at 
org.apache.hadoop.ozone.client.rpc.RpcClient.getScmAddressForClient(RpcClient.java:211)
        at 
org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:145)
        ... 20 more
Caused by: java.net.UnknownHostException
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:450)
        ... 46 more
Invalid host name: local host is: (unknown); destination host is: "om":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost
{code}


> Make the DNS resolution in OzoneManager more resilient
> ------------------------------------------------------
>
>                 Key: HDDS-999
>                 URL: https://issues.apache.org/jira/browse/HDDS-999
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Elek, Marton
>            Assignee: Siddharth Wagle
>            Priority: Major
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.<init>(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:994)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:421)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>     at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
>     at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:265)
>     at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
>     at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
>     at sun.nio.ch.Net.translateToSocketException(Net.java:131)
>     at sun.nio.ch.Net.translateException(Net.java:157)
>     at sun.nio.ch.Net.translateException(Net.java:163)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:549)
>     ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
>     at sun.nio.ch.Net.checkAddress(Net.java:101)
>     at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>     ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in 
> datanode side and HDDS-907 which is the workaround while this issue is not 
> resolved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to