[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848406#comment-13848406
 ] 

viswanathan commented on MAPREDUCE-5351:
----------------------------------------

Hi Chris,

JT memory reaches 6.68/8.89 GB and not able to submit the job and UI is not
loading at all. But didn't see any JT OOM exceptions.

Have taken the thread dump of Jobtracker, and the JT thread dump as follows:

Deadlock Detection:

Can't print deadlocks:null
Thread 25817: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.hdfs.LeaseRenewer.run(int) @bci=274, line=397 (Compiled 
frame)
 - 
org.apache.hadoop.hdfs.LeaseRenewer.access$600(org.apache.hadoop.hdfs.LeaseRenewer,
 int) @bci=2, line=69 (Interpreted frame)
 - org.apache.hadoop.hdfs.LeaseRenewer$1.run() @bci=8, line=273 (Interpreted 
frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 25815: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run() 
@bci=245, line=3000 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25813: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 
(Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled 
frame)

Locked ownable synchronizers:
    - None

Thread 25812: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 
(Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled 
frame)

Locked ownable synchronizers:
    - None

Thread 25790: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)

Thread 25788: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 
(Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled 
frame)

Locked ownable synchronizers:
    - None

Thread 25786: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 
(Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled 
frame)

Locked ownable synchronizers:
    - None

Thread 25761: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 
(Compiled frame; information may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=210 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=65 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=69 (Compiled 
frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=80 (Interpreted frame)
 - 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(java.nio.channels.SelectableChannel,
 int, long) @bci=46, line=332 (Interpreted frame)
 - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) 
@bci=80, line=157 (Compiled frame)
 - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, 
line=155 (Compiled frame)
 - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, 
line=128 (Compiled frame)
 - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=116 
(Interpreted frame)
 - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, 
int) @bci=4, line=364 (Interpreted frame)
 - java.io.BufferedInputStream.fill() @bci=175, line=218 (Compiled frame)
 - java.io.BufferedInputStream.read() @bci=12, line=237 (Compiled frame)
 - java.io.DataInputStream.readInt() @bci=4, line=370 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.receiveResponse() @bci=19, line=845 
(Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=790 (Compiled 
frame)

-------------------------------------------------------------------------------------------------------------------------------

And Jobtracker heap summary as follows:

using thread-local object allocation.
Parallel GC with 10 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 10737418240 (10240.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 6488064 (6.1875MB)
   used     = 6488064 (6.1875MB)
   free     = 0 (0.0MB)
   100.0% used
>From Space:
   capacity = 9764864 (9.3125MB)
   used     = 0 (0.0MB)
   free     = 9764864 (9.3125MB)
   0.0% used
To Space:
   capacity = 9764864 (9.3125MB)
   used     = 0 (0.0MB)
   free     = 9764864 (9.3125MB)
   0.0% used
PS Old Generation
   capacity = 7158300672 (6826.6875MB)
   used     = 7158240200 (6826.629829406738MB)
   free     = 60472 (0.05767059326171875MB)
   99.99915521849708% used
PS Perm Generation
   capacity = 26738688 (25.5MB)
   used     = 26428648 (25.204322814941406MB)
   free     = 310040 (0.29567718505859375MB)
   98.8404816272212% used

Please help. It affects our production system.

Thanks,
Viswa


> JobTracker memory leak caused by CleanupQueue reopening FileSystem
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5351
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.1.2
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
>             Fix For: 1-win, 1.2.1
>
>         Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, 
> MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, 
> MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch
>
>
> When a job is completed, closeAllForUGI is called to close all the cached 
> FileSystems in the FileSystem cache.  However, the CleanupQueue may run after 
> this occurs and call FileSystem.get() to delete the staging directory, adding 
> a FileSystem to the cache that will never be closed.
> People on the user-list have reported this causing their JobTrackers to OOME 
> every two weeks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to