[
https://issues.apache.org/jira/browse/MAPREDUCE-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925738#action_12925738
]
Kang Xiao commented on MAPREDUCE-2116:
--------------------------------------
bq. That's because in getTasksToKill(), the shouldClose() got called so many
times.
Profiling shows that getTasksToSave() also calls taskidToTIPMap.get() but uses
much less cpu. The main differences between getTasksToKill() and
getTasksToSave() are:
* tip.shouldCommit(taskId) is simpler than tip.shouldClose(taskId), the patch
has optimized tip.shouldClose(taskId)
* getTasksToKill() iterators on trackerToTaskMap.get(taskTracker) task attempt
set while getTasksToSave() iterators on tasktrackerstatus.getTaskReports() set.
The set iteratored by getTasksToSave() is smaller than getTasksToKill(). It
still need to be optimized.
It may be not necessary for getTasksToKill() to iterators on
trackerToTaskMap.get(taskTracker) which is all running/complted tasks of
tasktracker:
# running tasks are included in tasktrackerstatus.getTaskReports()
# user killed task can not be completed, according to TaskInProgress.killTask()
# completed maps on tasktracker can be purged by KillJobAction upon job
termination
# lostTaskTracker() clear tasks for tasktracker by calling
trackerToTaskMap.remove(taskTracker)
> optimize getTasksToKill to reduce JobTracker contention
> -------------------------------------------------------
>
> Key: MAPREDUCE-2116
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2116
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Reporter: Joydeep Sen Sarma
> Attachments: 2116.1.patch, 2116.2.patch, 2116.3.patch,
> getTaskToKill.JPG
>
>
> getTasksToKill shows up as one of the top routines holding the JT lock.
> Specifically, the translation from attemptid to tip is very expensive:
> at java.util.TreeMap.getEntry(TreeMap.java:328)
> at java.util.TreeMap.get(TreeMap.java:255)
> at
> org.apache.hadoop.mapred.TaskInProgress.shouldClose(TaskInProgress.java:500)
> at
> org.apache.hadoop.mapred.JobTracker.getTasksToKill(JobTracker.java:3464)
> locked <0x00002aab6ebb6640> (a org.apache.hadoop.mapred.JobTracker)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3181)
> this seems like an avoidable expense since the tip for a given attempt is
> fixed (and one should not need a map lookup to find the association). on a
> different note - not clear to me why TreeMaps are in use here (i didn't find
> any iteration over these maps). any background info on why things are
> arranged the way they are would be useful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.