I am trying to understand what happens during the time duration when Map task got finished and reduce task starts executing. I have 2 machines with 4 process + 4 Gigs on each with NFS (not dfs) to process 50 Gigs of data. Map taks finish completion successfully. After that I see the following on the tasktracker log.
"Exception in thread "Server handler 1 on 50040" java.lang.OutOfMemoryError: Java heap space" Lister below is the configuration parameter. Am I setting JAVA memory heap very low compared to io.sort.mb or file buffer size? I thought Tasktracker just pushes the job to the child node, does it because of something like moving data ? If so is there a buffer size I can set a limit? Also, I noticed on mapred local each under the directotries for reduce files start growing even after tasktracker has "out of memory error". Any feedback would be appreciated. Thanks, VJ ------------------------------------------------------------------- <name>io.sort.factor</name> <value>10</value> <name>io.sort.mb</name> <value>500</value> <name>io.skip.checksum.errors</name> <value>false</value> <name>io.file.buffer.size</name> <value>4096000</value> <name>mapred.reduce.tasks</name> <value>6</value> <name>mapred.task.timeout</name> <value>100000000000</value> <name>mapred.tasktracker.tasks.maximum</name> <value>3</value> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> <name>mapred.combine.buffer.size</name> <value>100000</value> <name>mapred.speculative.execution</name> <value>true</value> <name>ipc.client.timeout</name> <value>60000</value> ------------------------------------------------------------ # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=1024 ------------------------------------------------------------
