[
https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700550#action_12700550
]
Devaraj Das edited comment on HADOOP-5632 at 4/18/09 11:40 PM:
---------------------------------------------------------------
If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we
explicitly call those out as separate RPCs. Tasktrackers makes certain
assumptions about a successful heartbeat, and since tasktrackers always sends a
regular (heavyweight) heartbeat, there is a problem to do with status reporting
for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s)
fails just before sending the heartbeat. The tasktracker sends the status of
those tasks, and the JobTracker processes this heartbeat as a lightweight one
(thereby doesn't do the processing of status updates). The tasktracker removes
these from the runningTasks map after getting the heartbeat response, and won't
report the statuses of those tasks again. The JobTracker will be unaware of
such task failures..
Also, maybe, we should process the failed/killed tasks' statuses in the
lightweight heartbeat as well. The logic being failed/killed tasks should be
given the same treatment as virgin tasks. It actually makes sense to give
higher priority to failed tasks during task assignment since if there is a
deterministic failure on every attempt, the job would fail fast (after a
certain number of attempts of the same task), leading to better cluster
utilization..
was (Author: devaraj):
If we go the route of lightweight/heavyweight heartbeat, I'd suggest that
we explicitly call those out as separate RPCs. Tasktrackers makes certain
assumptions about a successful heartbeat, and since tasktrackers always sends a
regular (heavyweight) heartbeat, there is a problem to do with status reporting
for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s)
fails just before sending the heartbeat. The tasktracker sends the status of
those tasks. The tasktracker removes these from the runningTasks map after
getting the heartbeat response, and won't report the statuses of those tasks
again. The JobTracker will be unaware of such task failures..
Also, maybe, we should process the failed/killed tasks' statuses in the
lightweight heartbeat as well. The logic being failed/killed tasks should be
given the same treatment as virgin tasks. It actually makes sense to give
higher priority to failed tasks during task assignment since if there is a
deterministic failure on every attempt, the job would fail fast (after a
certain number of attempts of the same task), leading to better cluster
utilization..
> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
> Key: HADOOP-5632
> URL: https://issues.apache.org/jira/browse/HADOOP-5632
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
> Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux
> boxes, 100 node cluster
> Reporter: Khaled Elmeleegy
> Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf,
> hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker.patch,
> jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even
> under heavy load.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.