[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291001#comment-14291001
 ] 

Hadoop QA commented on MAPREDUCE-6224:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12694413/MAPREDUCE-6224.branch-1.000.patch
  against trunk revision 3703965.

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5119//console

This message is automatically generated.

> resolve the hosts in DNSToSwitchMapping before inter tracker server start to 
> avoid IPC timeout in Task Tracker heartbeat
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-6224.branch-1.000.patch
>
>
> Resolve the hosts to fill up the cache in CachedDNSToSwitchMapping before 
> inter tracker server start to avoid IPC timeout in Task Tracker heartbeat.
> We saw IPC timeout happen in Task Tracker heartbeat for a large MR1 cluster 
> which use topology script(ShellCommandExecutor) to resolve the Network 
> Topology for Task Tracker host in ScriptBasedMapping.
> The reason is 
> Right after inter tracker server start in Job Tracker, Job Tracker receive a 
> lots HeartBeat from the Task Tracker. 
> heartbeat function call resolveAndAddToTopology to resolve the Network 
> Topology for Task Tracker host in ScriptBasedMapping which implement 
> CachedDNSToSwitchMapping.
> ScriptBasedMapping#resolve will check whether the host is in the cache,
> If the host is not in the cache, it will run topology script to get the 
> host's Network Topology using ShellCommandExecutor. Normally running topology 
> script is time consuming, which may cause the IPC time if too many heartbeat 
> happened at the same time for a large MR1 cluster.
> The solution is to resolve the Network Topology for all hosts in the hosts 
> list from HostsFileReader before receive any heartbeat from Task Tracker, so 
> the cache in ScriptBasedMapping will be filled up, and when heartbeat call 
> resolveAndAddToTopology, it will get the result from the cache instead of 
> running topology script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to