[
https://issues.apache.org/jira/browse/MAPREDUCE-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291001#comment-14291001
]
Hadoop QA commented on MAPREDUCE-6224:
--------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12694413/MAPREDUCE-6224.branch-1.000.patch
against trunk revision 3703965.
{color:red}-1 patch{color}. The patch command could not apply the patch.
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5119//console
This message is automatically generated.
> resolve the hosts in DNSToSwitchMapping before inter tracker server start to
> avoid IPC timeout in Task Tracker heartbeat
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6224
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6224
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv1
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: MAPREDUCE-6224.branch-1.000.patch
>
>
> Resolve the hosts to fill up the cache in CachedDNSToSwitchMapping before
> inter tracker server start to avoid IPC timeout in Task Tracker heartbeat.
> We saw IPC timeout happen in Task Tracker heartbeat for a large MR1 cluster
> which use topology script(ShellCommandExecutor) to resolve the Network
> Topology for Task Tracker host in ScriptBasedMapping.
> The reason is
> Right after inter tracker server start in Job Tracker, Job Tracker receive a
> lots HeartBeat from the Task Tracker.
> heartbeat function call resolveAndAddToTopology to resolve the Network
> Topology for Task Tracker host in ScriptBasedMapping which implement
> CachedDNSToSwitchMapping.
> ScriptBasedMapping#resolve will check whether the host is in the cache,
> If the host is not in the cache, it will run topology script to get the
> host's Network Topology using ShellCommandExecutor. Normally running topology
> script is time consuming, which may cause the IPC time if too many heartbeat
> happened at the same time for a large MR1 cluster.
> The solution is to resolve the Network Topology for all hosts in the hosts
> list from HostsFileReader before receive any heartbeat from Task Tracker, so
> the cache in ScriptBasedMapping will be filled up, and when heartbeat call
> resolveAndAddToTopology, it will get the result from the cache instead of
> running topology script.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)