[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

Sean Chow (Jira) Fri, 05 Jun 2020 02:21:12 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Chow updated HDFS-15390:
-----------------------------
    Description: 
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
        at org.apache.hadoop.ipc.Client.call(Client.java:1440)
        at org.apache.hadoop.ipc.Client.call(Client.java:1401)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

We can see the client has {{Address change detected}}, but it still fails. I 
find out that's because when method {{updateAddress()}} return true,  the 
{{handleConnectionFailure()}} thow an exception that break the next retry with 
the right ipaddr.

 

  was:
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
        at org.apache.hadoop.ipc.Client.call(Client.java:1440)
        at org.apache.hadoop.ipc.Client.call(Client.java:1401)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

 

We can see the client has Address change detected, but it still fails. I find 
out that's because when method updateAddress() return ture,  the 
handleConnectionFailure() thow an exception that break the next retry with the 
right ipaddr.

 


> client fails forever when namenode ipaddr changed
> -------------------------------------------------
>
>                 Key: HDFS-15390
>                 URL: https://issues.apache.org/jira/browse/HDFS-15390
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient
>    Affects Versions: 2.10.0, 2.9.2, 3.2.1
>            Reporter: Sean Chow
>            Priority: Major
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1440)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1401)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>         at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
>         at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

Reply via email to