[ 
https://issues.apache.org/jira/browse/HDFS-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209837#comment-15209837
 ] 

Ming Ma commented on HDFS-10203:
--------------------------------

Couple ideas we talked about:

* Use  {{TableMapping}} or better version of {{ScriptBasedMapping}} that can 
skip the launch of script command for any non-datanode client machines.
* Add a new configuration in NN to control if DatanodeManager should skip the 
topology resolution for any non-datanode client machines, regardless which 
{{DNSToSwitchMapping}} class is used. That won't work if YARN node managers 
aren't on the same machines as datanodes.
* Have DFSClient computes its client machine's network location and pass it to 
NN via a new  {{ClientProtocol#getBlockLocations}} method so that NN no longer 
needs to do topology resolution for client machines.
* Have NN pass unsorted datanode list in {{ClientProtocol#getBlockLocations}} 
and have DFSClient sort that list by network distance, given we have done some 
work in https://issues.apache.org/jira/browse/HDFS-9579. For compatibility, we 
might need a new method given we change the semantics from sorted datanodes to 
unsorted datanodes; in that way, old DFSClient and new NN can still achieve the 
read-from-closer-datanode requirement.

Thoughts?

> Excessive topology lookup for large number of client machines in namenode
> -------------------------------------------------------------------------
>
>                 Key: HDFS-10203
>                 URL: https://issues.apache.org/jira/browse/HDFS-10203
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>
> In the {{ClientProtocol#getBlockLocations}} call, DatanodeManager computes 
> the network distance between the client machine and the datanodes. As part of 
> that, it needs to resolve the network location of the client machine. If the 
> client machine isn't a datanode, it needs to ask {{DNSToSwitchMapping}} to 
> resolve it.
> {noformat}
>   public void sortLocatedBlocks(final String targethost,
>       final List<LocatedBlock> locatedblocks) {
>     //sort the blocks
>     // As it is possible for the separation of node manager and datanode, 
>     // here we should get node but not datanode only .
>     Node client = getDatanodeByHost(targethost);
>     if (client == null) {
>       List<String> hosts = new ArrayList<> (1);
>       hosts.add(targethost);
>       List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts);
>       if (resolvedHosts != null && !resolvedHosts.isEmpty()) {
>       ....
>       }
>     }
>   }
> {noformat}
> When there are ten of thousands of non-datanode client machines hitting the 
> namenode which uses {{ScriptBasedMapping}}, it causes the following issues:
> * After namenode startup, {{CachedDNSToSwitchMapping}} only has datanodes in 
> the cache. Calls from many different client machines means lots of process 
> fork to run the topology script and can cause spike in namenode load.
> * Cache size of {{CachedDNSToSwitchMapping}} can grow large. Under normal 
> workload  say < 100k client machines and each entry in the cache uses < 200 
> bytes, it will take up to 20MB, not much compared to NN's heap size. But in 
> theory it can still blow up NN if there is misconfiguration or malicious 
> attack with millions of IP addresses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to