[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997278#comment-12997278
 ] 

Kang Xiao commented on MAPREDUCE-2340:
--------------------------------------

For large jobs, job initialization seem to be very slow. The cause is that 
JobInProgress.initTasks() calls createCache() to build localiztion cache list. 
For each split location createCache() uses 
jobtracker.resolveAndAddToTopology(host) to get its topology node object. 
However, there is alreay a hostname => topology node map cache in jobtracker 
that can be used to speed up the get node by hostname operation. 

> optimize JobInProgress.initTasks()
> ----------------------------------
>
>                 Key: MAPREDUCE-2340
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Kang Xiao
>
> JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() 
> and JobInProgress.createCache() significantly. A test for 1 job with 100000 
> maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() 
> and createCache(). 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to