Prabhu Joseph created MAPREDUCE-6530:
----------------------------------------

             Summary: Jobtracker is slow when more JT UI requests
                 Key: MAPREDUCE-6530
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6530
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.5.1
            Reporter: Prabhu Joseph
            Priority: Blocker


JobTracker is slow when there are huge number of Jobs running and 30
connections were established to info port to view Job status and counters.

hadoop job -list took 4m22.412s

We took Jstack traces and found most of the server threads waiting on 
JobTracker object and the thread which has the lock on JobTracker waits for 
ResourceBundle object.

        "retireJobs" prio=10 tid=0x00007f2345200800 nid=0x11c1 waiting for
monitor entry [0x00007f22e3499000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
        - waiting to lock <0x0000000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
        at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
        at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
        at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
        - locked <0x00000007f8411608> (a org.apache.hadoop.mapred.Counters)
        at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
        at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
        at
org.apache.hadoop.mapred.JobTracker$RetireJobs.addToCache(JobTracker.java:657)
        - locked <0x000000009644ae08> (a
org.apache.hadoop.mapred.JobTracker$RetireJobs)
        at
org.apache.hadoop.mapred.JobTracker$RetireJobs.run(JobTracker.java:769)
        - locked <0x00000000964c5550> (a
org.apache.hadoop.mapred.FairScheduler)
        - locked <0x000000009644a9d0> (a java.util.Collections$SynchronizedMap)
        - locked <0x00000000962ac660> (a org.apache.hadoop.mapred.JobTracker)
        at java.lang.Thread.run(Thread.java:745)


The ResourceBundle object is locked most of the time by JT GUI jobtracker_jsp 
and does getMapCounters().


"926410165@qtp-1732070199-56" daemon prio=10 tid=0x00007f232c4df000 nid=0x27c0
runnable [0x00007f22db7bf000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Throwable.fillInStackTrace(Native Method)
        at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
        - locked <0x000000061a49ede0> (a java.util.MissingResourceException)
        at java.lang.Throwable.<init>(Throwable.java:287)
        at java.lang.Exception.<init>(Exception.java:84)
        at java.lang.RuntimeException.<init>(RuntimeException.java:80)
        at
java.util.MissingResourceException.<init>(MissingResourceException.java:85)
        at
java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1499)
        at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1322)
        at java.util.ResourceBundle.getBundle(ResourceBundle.java:1028)
        at
org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
        at
org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
        - locked <0x0000000197cc6218> (a java.lang.Class for
org.apache.hadoop.mapreduce.util.ResourceBundles)
        at
org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterName(ResourceBundles.java:89)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.localizeCounterName(FrameworkCounterGroup.java:135)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.access$000(FrameworkCounterGroup.java:47)
        at
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup$FrameworkCounter.getDisplayName(FrameworkCounterGroup.java:75)
        at
org.apache.hadoop.mapred.Counters$Counter.getDisplayName(Counters.java:130)
        at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:534)
        - locked <0x00000007ed1024b8> (a org.apache.hadoop.mapred.Counters)
        at
org.apache.hadoop.mapred.JobInProgress.incrementTaskCounters(JobInProgress.java:1728)
        at
org.apache.hadoop.mapred.JobInProgress.getMapCounters(JobInProgress.java:1669)
        at org.apache.hadoop.mapred.JSPUtil.generateJobTable(JSPUtil.java:436)
        at
org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:202)
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)


Every job updates their counters and all 30 UI clients reading the frequently 
updated counters leading to JT slowness.

With no JT UI requests, hadoop job -list completes in seconds.

How to fix JT slowness when there are 30 sessions wants to know the Job status 
and counters of huge number of Jobs running at a time.

Is there any workaround like JT UI caching or offloading some part in JT UI 
frontpage when load is heavy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to