[
https://issues.apache.org/jira/browse/HDFS-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867808#comment-16867808
]
Kihwal Lee commented on HDFS-14579:
-----------------------------------
Loading include/exclude causes lookups unless the host entries in the files are
all IP addresses. Usually this is hit once and then local caching daemon (e.g.
nscd) makes it faster in subsequent refreshes. But it all depends on how often
the files get refreshed and how the caching is configured. If a host file
contains unresolvable host names, they will cause hitting DNS for those entries
every time it gets refreshed, since negative cache expiration is usually much
shorter.
I agree it is an expensive write op, but also is not called frequently enough
to cause issues. On one cluster we have, {{refreshNodes}} is taking about 54
ms on average, excluding the lock wait and queue time. It has about 4,600
hostnames in include and 300 in exclude. The call queue needs to be big enough
to absorb blips like this.
> In refreshNodes, avoid performing a DNS lookup while holding the write lock
> ---------------------------------------------------------------------------
>
> Key: HDFS-14579
> URL: https://issues.apache.org/jira/browse/HDFS-14579
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Attachments: HDFS-14579.001.patch
>
>
> When refreshNodes is called on a large cluster, or a cluster where DNS is not
> performing well, it can cause the namenode to hang for a long time. This is
> because the refreshNodes operation holds the global write lock while it is
> running. Most of refreshNodes code is simple and hence fast, but
> unfortunately it performs a DNS lookup for each host in the cluster while the
> lock is held.
> Right now, it calls:
> {code}
> public void refreshNodes(final Configuration conf) throws IOException {
> refreshHostsReader(conf);
> namesystem.writeLock();
> try {
> refreshDatanodes();
> countSoftwareVersions();
> } finally {
> namesystem.writeUnlock();
> }
> }
> {code}
> The line refreshHostsReader(conf); reads the new config file and does a DNS
> lookup on each entry - the write lock is not held here. Then the main work is
> done here:
> {code}
> private void refreshDatanodes() {
> final Map<String, DatanodeDescriptor> copy;
> synchronized (this) {
> copy = new HashMap<>(datanodeMap);
> }
> for (DatanodeDescriptor node : copy.values()) {
> // Check if not include.
> if (!hostConfigManager.isIncluded(node)) {
> node.setDisallowed(true);
> } else {
> long maintenanceExpireTimeInMS =
> hostConfigManager.getMaintenanceExpirationTimeInMS(node);
> if (node.maintenanceNotExpired(maintenanceExpireTimeInMS)) {
> datanodeAdminManager.startMaintenance(
> node, maintenanceExpireTimeInMS);
> } else if (hostConfigManager.isExcluded(node)) {
> datanodeAdminManager.startDecommission(node);
> } else {
> datanodeAdminManager.stopMaintenance(node);
> datanodeAdminManager.stopDecommission(node);
> }
> }
> node.setUpgradeDomain(hostConfigManager.getUpgradeDomain(node));
> }
> }
> {code}
> All the isIncluded(), isExcluded() methods call node.getResolvedAddress()
> which does the DNS lookup. We could probably change things to perform all the
> DNS lookups outside of the write lock, and then take the lock and process the
> nodes. Also change or overload isIncluded() etc to take the inetAddress
> rather than the datanode descriptor.
> It would not shorten the time the operation takes to run overall, but it
> would move the long duration out of the write lock and avoid blocking the
> namenode for the entire time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]