I've been trying to trouble shoot an OOME we've been having.

When we run the job over a dataset that about 700GB (~9000 files) or larger
we will get an OOME on the map jobs.  However if we run the job over smaller
set of the data then everything works out fine.  So my question is: What
changes in Hadoop as the size of the input set increases?

We are on hadoop 0.18.0.

Here's is a stack trace produced by the job tracker.
java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:2882) at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at
java.lang.StringBuffer.append(StringBuffer.java:224) at
com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1167)
at
com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1120)
at
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl.synchronizeData(DeferredTextImpl.java:93)
at
com.sun.org.apache.xerces.internal.dom.CharacterDataImpl.getData(CharacterDataImpl.java:160)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:928)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:851)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:819) at
org.apache.hadoop.conf.Configuration.get(Configuration.java:278) at
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:446) at
org.apache.hadoop.mapred.JobConf.getKeepFailedTaskFiles(JobConf.java:308) at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.setJobConf(TaskTracker.java:1506)
at
org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:727)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:721) at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1306) at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:946) at
org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1343) at
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2354)


Thanks,
Philip.

Reply via email to