zhihai xu created MAPREDUCE-6224:
------------------------------------

             Summary: resolve the hosts in DNSToSwitchMapping before inter 
tracker server start to avoid IPC timeout in Task Tracker heartbeat
                 Key: MAPREDUCE-6224
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6224
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv1
            Reporter: zhihai xu
            Assignee: zhihai xu


Resolve the hosts to fill up the cache in CachedDNSToSwitchMapping before inter 
tracker server start to avoid IPC timeout in Task Tracker heartbeat.
We saw IPC timeout happen in Task Tracker heartbeat for a large MR1 cluster 
which use topology script(ShellCommandExecutor) to resolve the Network Topology 
for Task Tracker host in ScriptBasedMapping.
The reason is 
Right after inter tracker server start in Job Tracker, Job Tracker receive a 
lots HeartBeat from the Task Tracker. 
heartbeat function call resolveAndAddToTopology to resolve the Network Topology 
for Task Tracker host in ScriptBasedMapping which implement 
CachedDNSToSwitchMapping.
ScriptBasedMapping#resolve will check whether the host is in the cache,
If the host is not in the cache, it will run topology script to get the host's 
Network Topology using ShellCommandExecutor. Normally running topology script 
is time consuming, which may cause the IPC time if too many heartbeat happened 
at the same time for a large MR1 cluster.
The solution is to resolve the Network Topology for all hosts in the hosts list 
from HostsFileReader before receive any heartbeat from Task Tracker, so the 
cache in ScriptBasedMapping will be filled up, and when heartbeat call 
resolveAndAddToTopology, it will get the result from the cache instead of 
running topology script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to