Re: OOME only with large datasets

Arun C Murthy Wed, 17 Dec 2008 11:21:52 -0800


On Dec 17, 2008, at 10:44 AM, Philip wrote:

I've been trying to trouble shoot an OOME we've been having.
When we run the job over a dataset that about 700GB (~9000 files) orlargerwe will get an OOME on the map jobs. However if we run the job oversmallerset of the data then everything works out fine. So my question is:What
changes in Hadoop as the size of the input set increases?

We are on hadoop 0.18.0.

I suspect the reason is that larger data-sets result in more maps andwe seem to have a memory leak at the TaskTracker which depends on thenumber of maps being run on a given TaskTracker.I've opened https://issues.apache.org/jira/browse/HADOOP-4906 to trackthis.

As a workaround you could try increasing the heapsize for theTaskTracker via HADOOP_TASKTRACKER_OPTS in conf/hadoop-env.sh.


Arun

Re: OOME only with large datasets

Reply via email to