[ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daryn Sharp updated HDFS-3990: ------------------------------ Attachment: HDFS-3990.patch No longer init {{peerHostName}} to the DN's registration hostname. Check for null when building list of node names to filter. I again looked into removing the null check on {{Server.getRemoteAddress}}. The tests that call directly into the rpc server object, rather than via a connection, appear to be passing mock dn registrations. So the majority of functional tests are matching real cluster behavior. I tried having the rpc server set the ip/peerHostName but some of the tests are verifying the layout and version checks work. So I tried to push those down into the {{FSNamesystem#registerDatanode}} but that method isn't exposed for the tests to call. If this patch is ok, I'll update the 23 patch. > NN's health report has severe performance problems > -------------------------------------------------- > > Key: HDFS-3990 > URL: https://issues.apache.org/jira/browse/HDFS-3990 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt > > > The dfshealth page will place a read lock on the namespace while it does a > dns lookup for every DN. On a multi-thousand node cluster, this often > results in 10s+ load time for the health page. 10 concurrent requests were > found to cause 7m+ load times during which time write operations blocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira