Aihua Xu created HDFS-15800:
-------------------------------
Summary: DataNode to handle NameNode IP changes
Key: HDFS-15800
URL: https://issues.apache.org/jira/browse/HDFS-15800
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Affects Versions: 2.8.0
Reporter: Aihua Xu
Hadoop-17068 is to handle the case of NameNode IP address changes in which HDFS
client will update the IP address after the connection failure.
DataNodes also use the same logic to refresh IP address for the connection.
Such connection is reused with the default idle time 10 seconds. (set by
ipc.client.connection.maxidletime). If the connection is closed and the
DataNode will use the old NameNode IP address to connect and refresh to the new
IP address after the first failure.
The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the
server value getting refreshed will not reflect in remoteId.address, while the
next connection creation will use remoteId.address.
{{if (!server.equals(currentAddr)) {}}
{{ LOG.warn("Address change detected. Old: " + server.toString() +}}
{{ " New: " + currentAddr.toString()); }}
{{ server = currentAddr;}}
Such kind of retry in a big cluster will cause random "BLOCK*
blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < minimum
= 1) in fie" error if all three replicas take one retry to read/write the
block.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]