Hi, > Actually I enabled all level logs. But I didn't realize to check logs in .out > files and only looked at .log file and didn't see any error msgs. now I > opened the .out file and saw the following logged exception: > > Exception in thread "IPC Server handler 5 on 50002" > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2786) > at > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133) > at org.apache.hadoop.ipc.Server.setupResponse(Server.java:1087) > at org.apache.hadoop.ipc.Server.access$2400(Server.java:77) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:984) > > Exception in thread "IPC Client (47) connection to servername/serverip:50001 > from auser" java.lang.OutOfMemoryError: Java heap space > at java.nio.ByteBuffer.wrap(ByteBuffer.java:350) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) > > Exception in thread "LeaseChecker" java.lang.OutOfMemoryError: GC overhead > limit exceeded > Exception in thread "IPC Server handler 4 on 50002" > java.lang.OutOfMemoryError: Java heap space > Exception in thread "IPC Server handler 8 on 50002" > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "IPC Server handler 9 on 50002" > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "IPC Server handler 2 on 50002" > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "IPC Server handler 6 on 50002" > java.lang.OutOfMemoryError: GC overhead limit exceeded > Exception in thread "ResponseProcessor for block > blk_-573819335330670501_1268378" java.lang.OutOfMemoryError: GC overhead > limit exceeded > Exception in thread "IPC Server handler 7 on 50002" > java.lang.OutOfMemoryError: Java heap space > > 50001 and 50002 used for dfs and jobtracker. But it's normal that IPC threads > consume such big memory? >
It may not be IPC threads that are causing the problem, though I remember at least one IPC buffer related OOM somewhere. A more typical case is that the daemon is running out of memory and the IPC threads are just manifesting the error because they're the entry-points into the application. I think we still need to go down the path that Alex was suggesting. What is the heap you have given the JobTracker ? What is number of Jobs / tasks / users ? How many nodes are there in the cluster ? > > --- On Fri, 7/30/10, Alex Loddengaard <a...@cloudera.com> wrote: > > From: Alex Loddengaard <a...@cloudera.com> > Subject: Re: jobtracker.jsp reports "GC overhead limit exceeded" > To: common-user@hadoop.apache.org > Date: Friday, July 30, 2010, 2:19 PM > > What does "ps" show you? How much memory is being used by the jobtracker, > and how large is its heap (loop for HADOOP_HEAPSIZE in hadoop-env.sh)? Also > consider turning on GC logging, which will find its way to the jobtracker > .out log in /var/log/hadoop: > > <http://java.sun.com/developer/technicalArticles/Programming/GCPortal/> > > Alex > > On Fri, Jul 30, 2010 at 3:10 PM, jiang licht <licht_ji...@yahoo.com> wrote: > >> http://server:50030/jobtracker.jsp generates the following error message: >> >> HTTP ERROR: 500 >> >> GC overhead limit exceeded >> >> RequestURI=/jobtracker.jsp >> Caused by: >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> Powered by Jetty:// >> >> The jobtracker is running below the limit. But "hadoop job -status" seems >> to halt and does not response ... >> >> The last 2 lines of jobtracker logs: >> >> 2010-07-30 13:53:18,482 DEBUG org.apache.hadoop.mapred.JobTracker: Got >> heartbeat from: >> tracker_host1:localhost.localdomain/127.0.0.1:53914(restarted: false >> initialContact: false acceptNewTasks: true) with >> responseId: -31252 >> 2010-07-30 13:55:32,917 DEBUG org.apache.hadoop.mapred.JobTracker: Starting >> launching task sweep >> >> Any thought about this? >> >> Thanks! >> --Michael >> >> >> > > > >