Aihua Xu created HDFS-15800:
-------------------------------

             Summary: DataNode to handle NameNode IP changes
                 Key: HDFS-15800
                 URL: https://issues.apache.org/jira/browse/HDFS-15800
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode
    Affects Versions: 2.8.0
            Reporter: Aihua Xu


Hadoop-17068 is to handle the case of NameNode IP address changes in which HDFS 
client will update the IP address after the connection failure.  

DataNodes also use the same logic to refresh IP address for the connection. 
Such connection is reused with the default idle time 10 seconds. (set by 
ipc.client.connection.maxidletime). If the connection is closed and the 
DataNode will use the old NameNode IP address to connect and refresh to the new 
IP address after the first failure.  

The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the 
server value getting refreshed will not reflect in remoteId.address, while the 
next connection creation will use remoteId.address.

{{if (!server.equals(currentAddr)) {}}
{{  LOG.warn("Address change detected. Old: " + server.toString() +}}
{{          " New: " + currentAddr.toString()); }}
{{   server = currentAddr;}}

 

Such kind of retry in a big cluster will cause random "BLOCK* 
blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < minimum 
= 1) in fie" error if all three replicas take one retry to read/write the 
block. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to