[
https://issues.apache.org/jira/browse/HADOOP-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627918#action_12627918
]
Amar Kamat commented on HADOOP-4018:
------------------------------------
bq. Can you pl explain which potion of code you are referring to here?
Look at how the job is inited. If the init fails then there is a cleanup
process associated with it. So simply throwing an exception would work and
there is no need to explicitly set the job state and finish-time.
bq. This API is used by JobInProgress.initTasks. This method computes the
number of tasks that is needed by this job.
Oops! I missed that. But its still flawed as I have mentioned in comment #3.
Plz check.
bq. Regarding 3 and 4 i agree with you that it is better if I can check these
limits in the constructor of JobInProgress. ....
I just checked and it seems that the job client never overwrites the number of
maps to be spawned. Since the num-maps passed by the user is just a hint to the
jobclient while calculating the splits, this information is of no use to the
jobtracker and hence the job-client can overwrite the num-maps parameter before
uploading the {{job.xml}} on the dfs. With this the job that should fail will
fail fast (i.e in the constructor itself) and the user will be informed as to
why the job failed.
Comment #3 just states that {{totalNumTasks()}} will also count tasks from
non-running (i.e killed/completed/failed) jobs. So {{totalNumTasks()}} should
only take {{RUNNING}} jobs into consideration which calculating total tasks.
> limit memory usage in jobtracker
> --------------------------------
>
> Key: HADOOP-4018
> URL: https://issues.apache.org/jira/browse/HADOOP-4018
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: maxSplits.patch, maxSplits2.patch, maxSplits3.patch,
> maxSplits4.patch, maxSplits5.patch
>
>
> We have seen instances when a user submitted a job with many thousands of
> mappers. The JobTracker was running with 3GB heap, but it was still not
> enough to prevent memory trashing from Garbage collection; effectively the
> Job Tracker was not able to serve jobs and had to be restarted.
> One simple proposal would be to limit the maximum number of tasks per job.
> This can be a configurable parameter. Is there other things that eat huge
> globs of memory in job Tracker?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.