[ https://issues.apache.org/jira/browse/HADOOP-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610334#action_12610334 ]
Christian Kunz commented on HADOOP-3670: ---------------------------------------- Because of the suspicion that GC was badly configured, I restarted JobTracker in 32-bit mode with default configuration, but with the options suggested by Owen: HADOOP_OPTS="-server -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError" HADOOP_HEAPSIZE=2500 RAM: 8GB The cluster has 200 nodes, jobs have typically at most 4000 maps and less than 400 reduces, but often 2 or 3 jobs run simultaneously. The JobTracker's memory footprint increased slowly close up to 2.4GB, and then after about 100 jobs a new job initialization failed: Job initialization failed: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:97) at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:76) at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:131) at org.apache.hadoop.mapred.JobClient$RawSplit.readFields(JobClient.java:797) at org.apache.hadoop.mapred.JobClient.readSplitFile(JobClient.java:863) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:308) at org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:418) at java.lang.Thread.run(Thread.java:619) >From then on JobTracker became unresponsive, running GC at full speed. Unfortunately, for some reason, I could not find a heap dump file. Typical GCtimestamp output (it looks as if from a certain point on full GC is running repeatedly with hardly any gain) 33015.775: [GC [PSYoungGen: 154229K->55555K(188352K)] 2344017K->2254659K(2463936K), 0.0688110 secs] 33027.318: [GC [PSYoungGen: 149123K->54221K(189632K)] 2348227K->2259655K(2465216K), 0.0603560 secs] 33046.658: [GC [PSYoungGen: 149069K->18692K(189632K)] 2354503K->2259525K(2465216K), 0.0683130 secs] 33056.766: [GC [PSYoungGen: 113537K->20288K(189632K)] 2354370K->2269026K(2465216K), 0.0415790 secs] 33056.808: [Full GC [PSYoungGen: 20288K->0K(189632K)] [PSOldGen: 2248737K->2268912K(2275584K)] 2269026K->2268 912K(2465216K) [PSPermGen: 11448K->11448K(16384K)], 1.7332610 secs] 33081.667: [Full GC [PSYoungGen: 94848K->0K(189632K)] [PSOldGen: 2268912K->2272832K(2275584K)] 2363760K->2272 832K(2465216K) [PSPermGen: 11448K->11448K(16384K)], 1.7537480 secs] 33096.646: [Full GC [PSYoungGen: 94848K->0K(189632K)] [PSOldGen: 2272832K->2262529K(2275584K)] 2367680K->2262 529K(2465216K) [PSPermGen: 11448K->11443K(16384K)], 3.2210170 secs] 33120.150: [Full GC [PSYoungGen: 94848K->0K(189632K)] [PSOldGen: 2262529K->2267044K(2275584K)] 2357377K->2267 044K(2465216K) [PSPermGen: 11443K->11443K(16384K)], 1.7487610 secs] 33136.949: [Full GC [PSYoungGen: 94848K->0K(189632K)] [PSOldGen: 2267044K->2272689K(2275584K)] 23618 > JobTracker running out of heap space > ------------------------------------ > > Key: HADOOP-3670 > URL: https://issues.apache.org/jira/browse/HADOOP-3670 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.17.0 > Reporter: Christian Kunz > > The JobTracker on our 0.17.0 installation runs out of heap space rather > quickly, with less than 100 jobs (at one time even after just 16 jobs). > Running in 64-bit mode with larger heap space does not help -- it will use up > all available RAM. > 2008-06-28 05:17:06,661 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 62 on 9020, call he > artbeat([EMAIL PROTECTED], false, true, 17384) from xxx.xxx.xxx.xxx > :51802: error: java.io.IOException: java.lang.OutOfMemoryError: GC overhead > limit exceeded > java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.