[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved MAPREDUCE-2235.
------------------------------------

    Resolution: Duplicate

Hi Vladimir. I think this was already covered by MAPREDUCE-1354 in trunk. Let 
me know if you disagree and we can reopen.

> JobTracker "over-synchronization" makes it hang up in certain cases 
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2235
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2235
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0
>            Reporter: Vladimir Klimontovich
>         Attachments: MAPREDUCE-2235-patch1.txt
>
>
> There is a genaral problem in JobTracker.java code: it's using "this" 
> synchronization everywhere so only one method could be executed at one 
> moment. When the job submit rate is low (lower then one job in several 
> seconds) tracker's working without a problem. When the job rate is high the 
> following problem occurs:
> Inside submitJob() JT copies job jar + xml to local filesystem. After that 
> it's doing "chmod" on those files. Hadoop does chmod  by spawning child 
> process. When JT heap is big (like several gigabytes) spawning child process 
> takes a lot of time (because java calls fork()) — in our case it's about 1-2 
> seconds. So job tracker can't handle high frequency job submits.
> Except of that, as heartbeat() method is also synchronized JT stops to 
> process heart-beat as "this" monitor is being held by submit job. That makes 
> JT thins that a lot of TaskTrackers are down.
> Following solution could help:
> "chmod" is being called from submitJob() method under following line:
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> This block could be taken away from synchronized code:
> public JobStatus submitJob(JobID jobId) throws IOException {
>     synchronized (this) {
>         .... the rest
>     }
>     //here we're leaving this line outside syncronized code as it doesn't 
> relate
>     //on state of JobTracker. Also this line
>     JobInProgress job = new JobInProgress(jobId, this, this.conf);
>     synchronized (this) {
>          .... the rest
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to