[
https://issues.apache.org/jira/browse/HADOOP-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aihua Xu updated HADOOP-17504:
------------------------------
Attachment: HADOOP-17504.patch
> New connection requires a retry to refresh NameNode IP changes
> --------------------------------------------------------------
>
> Key: HADOOP-17504
> URL: https://issues.apache.org/jira/browse/HADOOP-17504
> Project: Hadoop Common
> Issue Type: Improvement
> Components: common
> Affects Versions: 2.8.0
> Reporter: Aihua Xu
> Assignee: Aihua Xu
> Priority: Major
> Attachments: HADOOP-17504.patch
>
>
> Hadoop-17068 is to handle the case of NameNode IP address changes in which
> HDFS client will update the IP address after the connection failure.
> DataNodes also use the same logic to refresh IP address for the connection.
> Such connection is reused with the default idle time 10 seconds. (set by
> ipc.client.connection.maxidletime). If the connection is closed and the
> DataNode will use the old NameNode IP address to connect and refresh to the
> new IP address after the first failure.
> The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the
> server value getting refreshed will not reflect in remoteId.address, while
> the next connection creation will use remoteId.address.
> {{if (!server.equals(currentAddr)) {}}
> {{ LOG.warn("Address change detected. Old: " + server.toString() +}}
> {{ " New: " + currentAddr.toString()); }}
> {{ server = currentAddr;}}
>
> Such kind of retry in a big cluster will cause random "BLOCK*
> blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 <
> minimum = 1) in fie" error if all three replicas take one retry to read/write
> the block.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]