[
https://issues.apache.org/jira/browse/MAPREDUCE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joydeep Sen Sarma updated MAPREDUCE-2114:
-----------------------------------------
Description:
We are bound on the JobTracker lock on our largest cluster. One pattern i have
seen is the following:
- JT acquires JobTracker lock - but blocked on JIP lock:
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1028)
waiting to lock <0x00002aae21092ff8> (a
org.apache.hadoop.mapred.JobInProgress)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4403)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3444)
locked <0x00002aab6ebb6640> (a org.apache.hadoop.mapred.JobTracker)
- the JIP lock is typically held by a getcounters call:
locked <0x00002aaaf88beff8> (a org.apache.hadoop.mapred.Counters$Group)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:445)
locked <0x00002aaaf88bb948> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1253)
at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:1240)
locked <0x00002aae21092ff8> (a org.apache.hadoop.mapred.JobInProgress)
the solution seems simple. in order to summarize the counters for all tasks -
we need to only lock one task's counters at a time. we don't need to lock the
entire job.
was:
We are bound on the JobTracker lock on our largest cluster. One pattern i have
seen is the following:
- JT acquires JobTracker lock - but blocked on JIP lock:
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1028)
- waiting to lock <0x00002aae21092ff8> (a
org.apache.hadoop.mapred.JobInProgress)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4403)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3444)
- locked <0x00002aab6ebb6640> (a org.apache.hadoop.mapred.JobTracker)
- the JIP lock is typically held by a getcounters call:
- locked <0x00002aaaf88beff8> (a org.apache.hadoop.mapred.Counters$Group)
at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:445)
- locked <0x00002aaaf88bb948> (a org.apache.hadoop.mapred.Counters)
at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1253)
at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:1240)
- locked <0x00002aae21092ff8> (a org.apache.hadoop.mapred.JobInProgress)
the solution seems simple. in order to summarize the counters for all tasks -
we need to only lock one task's counters at a time. we don't need to lock the
entire job.
> user finer grained locks in JT getCounters implementation
> ---------------------------------------------------------
>
> Key: MAPREDUCE-2114
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2114
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker
> Reporter: Joydeep Sen Sarma
>
> We are bound on the JobTracker lock on our largest cluster. One pattern i
> have seen is the following:
> - JT acquires JobTracker lock - but blocked on JIP lock:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1028)
> waiting to lock <0x00002aae21092ff8> (a
> org.apache.hadoop.mapred.JobInProgress)
> at
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4403)
> at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3444)
> locked <0x00002aab6ebb6640> (a org.apache.hadoop.mapred.JobTracker)
> - the JIP lock is typically held by a getcounters call:
> locked <0x00002aaaf88beff8> (a org.apache.hadoop.mapred.Counters$Group)
> at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:445)
> locked <0x00002aaaf88bb948> (a org.apache.hadoop.mapred.Counters)
> at
> org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1253)
> at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:1240)
> locked <0x00002aae21092ff8> (a org.apache.hadoop.mapred.JobInProgress)
> the solution seems simple. in order to summarize the counters for all tasks -
> we need to only lock one task's counters at a time. we don't need to lock the
> entire job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.