[
https://issues.apache.org/jira/browse/HDFS-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212457#comment-15212457
]
Arpit Agarwal edited comment on HDFS-10203 at 3/25/16 9:44 PM:
---------------------------------------------------------------
bq. Have DFSClient computes its client machine's network location and pass it
to NN via a new ClientProtocol#getBlockLocations method so that NN no longer
needs to do topology resolution for client machines.
This sounds like a good idea. It can be done compatibly with the existing
method by introducing a new protobuf field.
bq. For compatibility, we might need a new method given we change the semantics
from sorted datanodes to unsorted datanodes; in that way, old DFSClient and new
NN can still achieve the read-from-closer-datanode requirement.
This can also be done compatibly by the client passing a new flag that requests
unsorted locations.
was (Author: arpitagarwal):
bq. Have DFSClient computes its client machine's network location and pass it
to NN via a new ClientProtocol#getBlockLocations method so that NN no longer
needs to do topology resolution for client machines.
This sounds like a good idea. It can be done compatibly by introducing a new
protobuf field.
bq. For compatibility, we might need a new method given we change the semantics
from sorted datanodes to unsorted datanodes; in that way, old DFSClient and new
NN can still achieve the read-from-closer-datanode requirement.
This can also be done compatibly by the client passing a new flag that requests
unsorted locations.
> Excessive topology lookup for large number of client machines in namenode
> -------------------------------------------------------------------------
>
> Key: HDFS-10203
> URL: https://issues.apache.org/jira/browse/HDFS-10203
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
>
> In the {{ClientProtocol#getBlockLocations}} call, DatanodeManager computes
> the network distance between the client machine and the datanodes. As part of
> that, it needs to resolve the network location of the client machine. If the
> client machine isn't a datanode, it needs to ask {{DNSToSwitchMapping}} to
> resolve it.
> {noformat}
> public void sortLocatedBlocks(final String targethost,
> final List<LocatedBlock> locatedblocks) {
> //sort the blocks
> // As it is possible for the separation of node manager and datanode,
> // here we should get node but not datanode only .
> Node client = getDatanodeByHost(targethost);
> if (client == null) {
> List<String> hosts = new ArrayList<> (1);
> hosts.add(targethost);
> List<String> resolvedHosts = dnsToSwitchMapping.resolve(hosts);
> if (resolvedHosts != null && !resolvedHosts.isEmpty()) {
> ....
> }
> }
> }
> {noformat}
> When there are ten of thousands of non-datanode client machines hitting the
> namenode which uses {{ScriptBasedMapping}}, it causes the following issues:
> * After namenode startup, {{CachedDNSToSwitchMapping}} only has datanodes in
> the cache. Calls from many different client machines means lots of process
> fork to run the topology script and can cause spike in namenode load.
> * Cache size of {{CachedDNSToSwitchMapping}} can grow large. Under normal
> workload say < 100k client machines and each entry in the cache uses < 200
> bytes, it will take up to 20MB, not much compared to NN's heap size. But in
> theory it can still blow up NN if there is misconfiguration or malicious
> attack with millions of IP addresses.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)