[ https://issues.apache.org/jira/browse/HDFS-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212457#comment-15212457 ]
Arpit Agarwal commented on HDFS-10203: -------------------------------------- bq. Have DFSClient computes its client machine's network location and pass it to NN via a new ClientProtocol#getBlockLocations method so that NN no longer needs to do topology resolution for client machines. This sounds like a good idea. It can be done compatibly by introducing a new protobuf field. bq. For compatibility, we might need a new method given we change the semantics from sorted datanodes to unsorted datanodes; in that way, old DFSClient and new NN can still achieve the read-from-closer-datanode requirement. This can also be done compatibly by the client passing a new flag that requests unsorted locations. > Excessive topology lookup for large number of client machines in namenode > ------------------------------------------------------------------------- > > Key: HDFS-10203 > URL: https://issues.apache.org/jira/browse/HDFS-10203 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > > In the {{ClientProtocol#getBlockLocations}} call, DatanodeManager computes > the network distance between the client machine and the datanodes. As part of > that, it needs to resolve the network location of the client machine. If the > client machine isn't a datanode, it needs to ask {{DNSToSwitchMapping}} to > resolve it. > {noformat} > public void sortLocatedBlocks(final String targethost, > final List<LocatedBlock> locatedblocks) { > //sort the blocks > // As it is possible for the separation of node manager and datanode, > // here we should get node but not datanode only . > Node client = getDatanodeByHost(targethost); > if (client == null) { > List<String> hosts = new ArrayList<> (1); > hosts.add(targethost); > List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts); > if (resolvedHosts != null && !resolvedHosts.isEmpty()) { > .... > } > } > } > {noformat} > When there are ten of thousands of non-datanode client machines hitting the > namenode which uses {{ScriptBasedMapping}}, it causes the following issues: > * After namenode startup, {{CachedDNSToSwitchMapping}} only has datanodes in > the cache. Calls from many different client machines means lots of process > fork to run the topology script and can cause spike in namenode load. > * Cache size of {{CachedDNSToSwitchMapping}} can grow large. Under normal > workload say < 100k client machines and each entry in the cache uses < 200 > bytes, it will take up to 20MB, not much compared to NN's heap size. But in > theory it can still blow up NN if there is misconfiguration or malicious > attack with millions of IP addresses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)