Re: Out of memory after Map tasks

Doug Cutting Thu, 25 May 2006 11:26:18 -0700

Vijay Murthi wrote:

I am trying to understand what happens during the time duration when Map task 
got finished and reduce task starts executing. I have 2 machines with 4 process 
+ 4 Gigs on each with NFS (not dfs) to process 50 Gigs of data. Map taks finish 
completion successfully. After that I see the following on the tasktracker log.


"Exception in thread "Server handler 1 on 50040" java.lang.OutOfMemoryError: Java 
heap space"

Are you running the current trunk? My guess is that you are. If so,then this error is "normal", things should keep running.

Are you running a 64-bit kernel? If not, can it really take advantageof all 4GB? In my experience, 32-bit JVM's can't effectively use morethan around 1.5GB, and a 32-bit kernel can't effectively use all 4GB,but I may be wrong on that last count.

Lister below is the configuration parameter. Am I setting JAVA memory heap very low 
compared to io.sort.mb or file buffer size? I thought Tasktracker just pushes the job to 
the child node, does it because of something like moving data ? If so is there a buffer 
size I can set a limit? Also, I noticed on mapred local each under the directotries for 
reduce files start growing even after tasktracker has "out of memory error".


Sorting does indeed happen in the child process.

4MB buffers for file streams seems large to me.

You might increase the io.sort.factor. With 500MB for sorting and asort factor of 100, each sort stream would get a 5MB buffer, plenty toensure that transfer time dominates seek, since the break-even point isaround 100kB. So you could even use a sort factor of 500. That wouldmake sorts a lot faster.

Also why are you setting the task timeout so high? Do you have mappersor reducers that take a long time per entry and are not callingReporter.setStatus() regularly? That can cause tasks to time out.


Doug

-------------------------------------------------------------------
  <name>io.sort.factor</name>
  <value>10</value>

  <name>io.sort.mb</name>
  <value>500</value>

  <name>io.skip.checksum.errors</name>
  <value>false</value>

  <name>io.file.buffer.size</name>
  <value>4096000</value>


  <name>mapred.reduce.tasks</name>
  <value>6</value>

  <name>mapred.task.timeout</name>
  <value>100000000000</value>

  <name>mapred.tasktracker.tasks.maximum</name>
  <value>3</value>

  <name>mapred.child.java.opts</name>
  <value>-Xmx1024m</value>

  <name>mapred.combine.buffer.size</name>
  <value>100000</value>

  <name>mapred.speculative.execution</name>
  <value>true</value>

  <name>ipc.client.timeout</name>
  <value>60000</value>

------------------------------------------------------------
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=1024
------------------------------------------------------------

Re: Out of memory after Map tasks

Reply via email to