I am trying to understand what happens during the time duration when Map task 
got finished and reduce task starts executing. I have 2 machines with 4 process 
+ 4 Gigs on each with NFS (not dfs) to process 50 Gigs of data. Map taks finish 
completion successfully. After that I see the following on the tasktracker log.

"Exception in thread "Server handler 1 on 50040" java.lang.OutOfMemoryError: 
Java heap space"


Lister below is the configuration parameter. Am I setting JAVA memory heap very 
low compared to io.sort.mb or file buffer size? I thought Tasktracker just 
pushes the job to the child node, does it because of something like moving data 
? If so is there a buffer size I can set a limit? Also, I noticed on mapred 
local each under the directotries for reduce files start growing even after 
tasktracker has "out of memory error".

Any feedback would be appreciated.

Thanks,
VJ



-------------------------------------------------------------------
  <name>io.sort.factor</name>
  <value>10</value>

  <name>io.sort.mb</name>
  <value>500</value>

  <name>io.skip.checksum.errors</name>
  <value>false</value>

  <name>io.file.buffer.size</name>
  <value>4096000</value>


  <name>mapred.reduce.tasks</name>
  <value>6</value>

  <name>mapred.task.timeout</name>
  <value>100000000000</value>

  <name>mapred.tasktracker.tasks.maximum</name>
  <value>3</value>

  <name>mapred.child.java.opts</name>
  <value>-Xmx1024m</value>

  <name>mapred.combine.buffer.size</name>
  <value>100000</value>

  <name>mapred.speculative.execution</name>
  <value>true</value>

  <name>ipc.client.timeout</name>
  <value>60000</value>

------------------------------------------------------------
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=1024
------------------------------------------------------------

Reply via email to