[
https://issues.apache.org/jira/browse/HADOOP-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628584#action_12628584
]
Amar Kamat commented on HADOOP-4018:
------------------------------------
bq. I have a question regarding item 1 above.
They are mutually exclusive. For data local tasks {{runningMapCache,
noinRunningMapcache, runningReduces and nonRunningReduces}} are used. For
non-data local tasks {{nonLocalMaps and nonLocalRunningMaps}} are used.
bq. it violates locking hierarchy,
Yes. One thing you could do is keep a global count of all allocated tasks in
the JobTracker. Jobs getting constructed will check the value before bailing
out. Once the job inits, it updates the value of this count. Any access/update
to the count should be guarded. Since the count will be updates only on passing
the init tests, we can be sure that limit-exceeding jobs never get inited. So
something like
{code}
In init :
1. Get the count lock
2. Check if the count + self-tasks > limit
2.1 If yes then throw an exception
2.2 else update the count and release the lock
{code}
Since the value of allocated tasks never change once inited, there is no point
in iterating over the jobs everytime. Once the job is removed from the
jobtracker (see RetireJobs), then update the count to reflect the change.
> limit memory usage in jobtracker
> --------------------------------
>
> Key: HADOOP-4018
> URL: https://issues.apache.org/jira/browse/HADOOP-4018
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: maxSplits.patch, maxSplits2.patch, maxSplits3.patch,
> maxSplits4.patch, maxSplits5.patch, maxSplits6.patch
>
>
> We have seen instances when a user submitted a job with many thousands of
> mappers. The JobTracker was running with 3GB heap, but it was still not
> enough to prevent memory trashing from Garbage collection; effectively the
> Job Tracker was not able to serve jobs and had to be restarted.
> One simple proposal would be to limit the maximum number of tasks per job.
> This can be a configurable parameter. Is there other things that eat huge
> globs of memory in job Tracker?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.