[ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719396#action_12719396 ]
Devaraj Das commented on HADOOP-6026: ------------------------------------- If you are using ScriptBasedMapping as the implementation for resolution, the problem outlined in this jira doesn't exist. The implementation of CachedDNSToSwitchMapping that the ScriptBasedMapping extends does the necessary caching. In fact, I don't think we should do this caching in the core framework (and then start worrying about the cache timeout, etc.). This should be left to the implementations of DNSToSwitchMapping. Thoughts? > Improve the performance efficiency of task initialization at the JobTracker > --------------------------------------------------------------------------- > > Key: HADOOP-6026 > URL: https://issues.apache.org/jira/browse/HADOOP-6026 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: dhruba borthakur > Assignee: Zheng Shao > Attachments: HADOOP-6026.1.patch > > > The JobTracker reads the splits for a job at Job Initialization time. Then, > for each location in the split, it invokes DNSToSwitchMapping.resolve(). > This, in turn, typically invokes an external script that resolves the > hostname to a network rack location. The time spent in invoking this external > script can be reduced if the hostname and their rack locations are inserted > into a cache. JobTracker.resolveAndAddToTopology() can look up this cache > first and avoid invoking the external "resolve" script is most cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.