[ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins updated HDFS-3990: ------------------------------ Attachment: hdfs-3990.txt Maintaining both an ipAddr/hostName plus nodeAddr with the same information, which can become inconsistent is error prone. For example what do you do when the ipAddr and the nodeAddr disagree? The ipAddr field for a DataNode ID should never change because it (and the xferPort) are the unique key for a DataNode. We also now have to worry about the state where we're both resolved and unresolved. Given that the crux of the problem is that we want to cache the DNS lookup for the ipAddr of a DN, it seems simplest to just do that. What do you think of the attached patch? It sets the DatanodeID hostname field at registration time (like the IP addr) using the same lookup we do today and replaces the two problematic lookups with uses of this field. This breaks {{dfs.datanode.hostname}} but this config is only used by the tests and we can fix those up. I'm happy to do that in another rev of this patch if you like the approach. I think a better approach would be to just use the lookup on the DN side (ie have the NN use the DN reported value) but that's a more risky change. > NN's health report has severe performance problems > -------------------------------------------------- > > Key: HDFS-3990 > URL: https://issues.apache.org/jira/browse/HDFS-3990 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt > > > The dfshealth page will place a read lock on the namespace while it does a > dns lookup for every DN. On a multi-thousand node cluster, this often > results in 10s+ load time for the health page. 10 concurrent requests were > found to cause 7m+ load times during which time write operations blocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira