[
https://issues.apache.org/jira/browse/MAPREDUCE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738274#action_12738274
]
Hemanth Yamijala commented on MAPREDUCE-802:
--------------------------------------------
+1 in general to remove the old status. It is error-prone, and has not been
needed other than to lookup, which can be done in other ways.
bq. whenever JobInProgress changes its state, we route the associated event to
JobTracker. This will ensure that any part of code which changes the JobStatus
would actually result in events being raised.
We should add the code for raising the events at the common-most denominator in
the code paths. For e.g. all completed jobs pass through an API such as
garbageCollect(). The run state change event should be used there. One of the
problems with the current approach is that this event is raised at many places.
I think we should write the code in such a way that multiple events of the same
type will be a no-op. IOW, if the scheduler has already removed a job from it's
queue, another call to repeat the action should be a no-op.
bq. ...scheduler would have to maintain its association between job to job
scheduling info i.e. a Map<JobID,JobSchedulingInfo>...
An alternate option would be to iterate the jobs whenever there's a removal
required. This is in the order of jobs submitted, but would save on memory
used. Even for a 1000 jobs this might not be such a bad deal. Thoughts ?
> Simplify the job updated event notification between Jobtracker and schedulers
> -----------------------------------------------------------------------------
>
> Key: MAPREDUCE-802
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-802
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Reporter: Hemanth Yamijala
> Assignee: Sreekanth Ramakrishnan
>
> HADOOP-4053 and HADOOP-4149 added events to take care of updates to the state
> / property of a job like the run state / priority of a job notified to the
> scheduler. We've seen some issues with this framework, such as the following:
> - Events are not raised correctly at all places. If a new code path is added
> to kill a job, raising events is missed out.
> - Events are raised with incorrect event data. For e.g. typically start time
> value is missed out.
> The resulting contract break between jobtracker and schedulers has lead to
> problems in the capacity scheduler where jobs remain stuck in the queue
> without being ever removed and so on.
> It has proven complicated to get this right in the framework and fixes have
> typically still left dangling cases. Or new code paths introduce new bugs.
> This JIRA is about trying to simplify the interaction model so that it is
> more robust and works well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.