[
https://issues.apache.org/jira/browse/MAPREDUCE-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029769#comment-15029769
]
Prabhu Joseph commented on MAPREDUCE-6530:
------------------------------------------
JSPUtil is the one takes lot of time holding ResourceBundles lock by calling
getMapCounters and getReduceCounters.
Assume we have 150 Jobs Running at a time with 7000 total tasks. Each Task has
two counter groups FileSystemCounter and TaskCounter and total 14 counters
under them.
For Each refresh on JT UI page, the JSPUtil#generateJobtable() is called, which
for each 150 jobs, calls JobInProgress#getMapCounters(), which for each tasks
of that Job, calls
Counters#incrAllCounters(), which does INCREMENT for two groups and total 14
counters
But Finally JT UI displays only PHYSICAL_MEMORY_BYTES and CPU_MILLISECONDS out
of 14 counters on the front page.
The getMapCounters method is the one causing too much slowness because of
waiting for the lock on ResourceBundles. So We need to refactor the code in
such a way,
It does Increment only for those two counters instead of all the counters.
> Jobtracker is slow when more JT UI requests
> -------------------------------------------
>
> Key: MAPREDUCE-6530
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6530
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 1.2.1
> Reporter: Prabhu Joseph
>
> JobTracker is slow when there are huge number of Jobs running and 30
> connections were established to info port to view Job status and counters.
> hadoop job -list took 4m22.412s
> We took Jstack traces and found most of the server threads waiting on
> JobTracker object and the thread which has the lock on JobTracker waits for
> ResourceBundle object.
> "retireJobs" prio=10 tid=0x00007f2345200800 nid=0x11c1 waiting for
> monitor entry [0x00007f22e3499000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
> - waiting to lock <0x0000000197cc6218> (a java.lang.Class for
> org.apache.hadoop.mapreduce.util.ResourceBundles)
> at
> org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
> at
> org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
> at
> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
> - locked <0x00000007f8411608> (a org.apache.hadoop.mapred.Counters)
> at
> org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
> at
> org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
> at
> org.apache.hadoop.mapred.JobTracker$RetireJobs.addToCache(JobTracker.java:657)
> - locked <0x000000009644ae08> (a
> org.apache.hadoop.mapred.JobTracker$RetireJobs)
> at
> org.apache.hadoop.mapred.JobTracker$RetireJobs.run(JobTracker.java:769)
> - locked <0x00000000964c5550> (a
> org.apache.hadoop.mapred.FairScheduler)
> - locked <0x000000009644a9d0> (a
> java.util.Collections$SynchronizedMap)
> - locked <0x00000000962ac660> (a org.apache.hadoop.mapred.JobTracker)
> at java.lang.Thread.run(Thread.java:745)
> The ResourceBundle object is locked most of the time by JT GUI jobtracker_jsp
> and does getMapCounters().
> "926410165@qtp-1732070199-56" daemon prio=10 tid=0x00007f232c4df000 nid=0x27c0
> runnable [0x00007f22db7bf000]
> java.lang.Thread.State: RUNNABLE
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> - locked <0x000000061a49ede0> (a java.util.MissingResourceException)
> at java.lang.Throwable.<init>(Throwable.java:287)
> at java.lang.Exception.<init>(Exception.java:84)
> at java.lang.RuntimeException.<init>(RuntimeException.java:80)
> at
> java.util.MissingResourceException.<init>(MissingResourceException.java:85)
> at
> java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1499)
> at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1322)
> at java.util.ResourceBundle.getBundle(ResourceBundle.java:1028)
> at
> org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
> at
> org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
> - locked <0x0000000197cc6218> (a java.lang.Class for
> org.apache.hadoop.mapreduce.util.ResourceBundles)
> at
> org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
> at
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
> at
> org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
> at
> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
> - locked <0x00000007ed1024b8> (a org.apache.hadoop.mapred.Counters)
> at
> org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
> at
> org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
> at org.apache.hadoop.mapred.JSPUtil.generateJobTable(JSPUtil.java:436)
> at
> org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:202)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> Every job updates their counters and all 30 UI clients reading the frequently
> updated counters leading to JT slowness.
> With no JT UI requests, hadoop job -list completes in seconds.
> How to fix JT slowness when there are 30 sessions wants to know the Job
> status and counters of huge number of Jobs running at a time.
> Is there any workaround like JT UI caching or offloading some part in JT UI
> frontpage when load is heavy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)