[
https://issues.apache.org/jira/browse/HDFS-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209837#comment-15209837
]
Ming Ma commented on HDFS-10203:
--------------------------------
Couple ideas we talked about:
* Use {{TableMapping}} or better version of {{ScriptBasedMapping}} that can
skip the launch of script command for any non-datanode client machines.
* Add a new configuration in NN to control if DatanodeManager should skip the
topology resolution for any non-datanode client machines, regardless which
{{DNSToSwitchMapping}} class is used. That won't work if YARN node managers
aren't on the same machines as datanodes.
* Have DFSClient computes its client machine's network location and pass it to
NN via a new {{ClientProtocol#getBlockLocations}} method so that NN no longer
needs to do topology resolution for client machines.
* Have NN pass unsorted datanode list in {{ClientProtocol#getBlockLocations}}
and have DFSClient sort that list by network distance, given we have done some
work in https://issues.apache.org/jira/browse/HDFS-9579. For compatibility, we
might need a new method given we change the semantics from sorted datanodes to
unsorted datanodes; in that way, old DFSClient and new NN can still achieve the
read-from-closer-datanode requirement.
Thoughts?
> Excessive topology lookup for large number of client machines in namenode
> -------------------------------------------------------------------------
>
> Key: HDFS-10203
> URL: https://issues.apache.org/jira/browse/HDFS-10203
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
>
> In the {{ClientProtocol#getBlockLocations}} call, DatanodeManager computes
> the network distance between the client machine and the datanodes. As part of
> that, it needs to resolve the network location of the client machine. If the
> client machine isn't a datanode, it needs to ask {{DNSToSwitchMapping}} to
> resolve it.
> {noformat}
> public void sortLocatedBlocks(final String targethost,
> final List<LocatedBlock> locatedblocks) {
> //sort the blocks
> // As it is possible for the separation of node manager and datanode,
> // here we should get node but not datanode only .
> Node client = getDatanodeByHost(targethost);
> if (client == null) {
> List<String> hosts = new ArrayList<> (1);
> hosts.add(targethost);
> List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts);
> if (resolvedHosts != null && !resolvedHosts.isEmpty()) {
> ....
> }
> }
> }
> {noformat}
> When there are ten of thousands of non-datanode client machines hitting the
> namenode which uses {{ScriptBasedMapping}}, it causes the following issues:
> * After namenode startup, {{CachedDNSToSwitchMapping}} only has datanodes in
> the cache. Calls from many different client machines means lots of process
> fork to run the topology script and can cause spike in namenode load.
> * Cache size of {{CachedDNSToSwitchMapping}} can grow large. Under normal
> workload say < 100k client machines and each entry in the cache uses < 200
> bytes, it will take up to 20MB, not much compared to NN's heap size. But in
> theory it can still blow up NN if there is misconfiguration or malicious
> attack with millions of IP addresses.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)