[ https://issues.apache.org/jira/browse/HDFS-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209837#comment-15209837 ]
Ming Ma commented on HDFS-10203: -------------------------------- Couple ideas we talked about: * Use {{TableMapping}} or better version of {{ScriptBasedMapping}} that can skip the launch of script command for any non-datanode client machines. * Add a new configuration in NN to control if DatanodeManager should skip the topology resolution for any non-datanode client machines, regardless which {{DNSToSwitchMapping}} class is used. That won't work if YARN node managers aren't on the same machines as datanodes. * Have DFSClient computes its client machine's network location and pass it to NN via a new {{ClientProtocol#getBlockLocations}} method so that NN no longer needs to do topology resolution for client machines. * Have NN pass unsorted datanode list in {{ClientProtocol#getBlockLocations}} and have DFSClient sort that list by network distance, given we have done some work in https://issues.apache.org/jira/browse/HDFS-9579. For compatibility, we might need a new method given we change the semantics from sorted datanodes to unsorted datanodes; in that way, old DFSClient and new NN can still achieve the read-from-closer-datanode requirement. Thoughts? > Excessive topology lookup for large number of client machines in namenode > ------------------------------------------------------------------------- > > Key: HDFS-10203 > URL: https://issues.apache.org/jira/browse/HDFS-10203 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > > In the {{ClientProtocol#getBlockLocations}} call, DatanodeManager computes > the network distance between the client machine and the datanodes. As part of > that, it needs to resolve the network location of the client machine. If the > client machine isn't a datanode, it needs to ask {{DNSToSwitchMapping}} to > resolve it. > {noformat} > public void sortLocatedBlocks(final String targethost, > final List<LocatedBlock> locatedblocks) { > //sort the blocks > // As it is possible for the separation of node manager and datanode, > // here we should get node but not datanode only . > Node client = getDatanodeByHost(targethost); > if (client == null) { > List<String> hosts = new ArrayList<> (1); > hosts.add(targethost); > List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts); > if (resolvedHosts != null && !resolvedHosts.isEmpty()) { > .... > } > } > } > {noformat} > When there are ten of thousands of non-datanode client machines hitting the > namenode which uses {{ScriptBasedMapping}}, it causes the following issues: > * After namenode startup, {{CachedDNSToSwitchMapping}} only has datanodes in > the cache. Calls from many different client machines means lots of process > fork to run the topology script and can cause spike in namenode load. > * Cache size of {{CachedDNSToSwitchMapping}} can grow large. Under normal > workload say < 100k client machines and each entry in the cache uses < 200 > bytes, it will take up to 20MB, not much compared to NN's heap size. But in > theory it can still blow up NN if there is misconfiguration or malicious > attack with millions of IP addresses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)