Aihua Xu created HDFS-15800: ------------------------------- Summary: DataNode to handle NameNode IP changes Key: HDFS-15800 URL: https://issues.apache.org/jira/browse/HDFS-15800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.8.0 Reporter: Aihua Xu
Hadoop-17068 is to handle the case of NameNode IP address changes in which HDFS client will update the IP address after the connection failure. DataNodes also use the same logic to refresh IP address for the connection. Such connection is reused with the default idle time 10 seconds. (set by ipc.client.connection.maxidletime). If the connection is closed and the DataNode will use the old NameNode IP address to connect and refresh to the new IP address after the first failure. The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the server value getting refreshed will not reflect in remoteId.address, while the next connection creation will use remoteId.address. {{if (!server.equals(currentAddr)) {}} {{ LOG.warn("Address change detected. Old: " + server.toString() +}} {{ " New: " + currentAddr.toString()); }} {{ server = currentAddr;}} Such kind of retry in a big cluster will cause random "BLOCK* blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in fie" error if all three replicas take one retry to read/write the block. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org