[
https://issues.apache.org/jira/browse/HADOOP-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566474#action_12566474
]
Amar Kamat commented on HADOOP-2790:
------------------------------------
bq. Also, another obvious optimization is to check whether the speculative
execution flag is true up front.
Even I noticed that few days back. But I thought HADOOP-2141 might fix that.
----
With HADOOP-2119, the calls to {{hasSpeculative()}} might reduce since we are
optimizing the look-ups for finding the higher priority runnable tasks and
totally avoiding speculative ones in these look-ups. So the check for
speculative tasks will be done only if we have nothing else to run. But +1 to
do it better than making all the checks all the time.
Following are the parameters used for deciding
{{TaskInProgress.hasSpeculative()}} :
- activeTasks.size() <= MAX_TASK_EXECS _[seems ok]_
- runSpeculative _[should be done earlier, but ok]_
- averageProgress - progress >= SPECULATIVE_GAP _[seems ok]_
- System.currentTimeMillis() - startTime >= SPECULATIVE_LAG :
This could be checked once in {{TaskInProgress.recomputeProgress()}} and a
check will only be done in {{hasSpeculative()}} if the earlier check resulted
as {{false}}. I guess we can still do better but my guess is that we cant
totally avoid {{System.currentTimeMillis()}} in
{{TaskInProgress.hasSpeculative()}}, no?
- completes == 0 _[ok]_
- !isOnlyCommitPending() :
May be a Map for _COMMIT_PENDING_ tasks can be maintained in
_TaskInProgress_ and the only check made is {{commitPendingStatuses.size() > 0
&& commitPendingStatuses.contains(taskId)}}. The space requirement will be same
with a re-arrangement to be done in {{TaskInProgress.recomputeProgress()}}.
----
Comments?
> TaskInProgress.hasSpeculativeTask is very inefficient
> -----------------------------------------------------
>
> Key: HADOOP-2790
> URL: https://issues.apache.org/jira/browse/HADOOP-2790
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Owen O'Malley
> Fix For: 0.16.1
>
>
> Each call to JobInProgress.findNewTask can call
> TaskInProgress.hasSpeculativeTask once per a task. Each call to
> hasSpeculativeTask calls System.getCurrentTimeMillis, which can result in
> hundreds of thousands of calls to getCurrentTimeMillis. Additionally, it
> calls TaskInProgress.isOnlyCommitPending, which calls .values() on the map
> from task id to host name and iterates through them to see if any of the
> tasks are in commit pending. It would be better to have a commit pending
> boolean flag in the TaskInProgress. It also looks like there are other
> opportunities here, but those jumped out at me. We should also look at this
> method in the profiler.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.