java.lang.OutOfMemoryError occurred while running the high ram streaming job.
-----------------------------------------------------------------------------
Key: MAPREDUCE-2211
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2211
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Reporter: Vinay Kumar Thota
I had generated the 3GB input data by using the random text writer. Later I
submitted the high ram streaming job in the command line. However, I found that
an out of memory error in one of the task attempt of reducer.
For reproducing the issue please follow the below steps.
1. Run the below command for generating the input data.
${HADOOP_HOME}/bin/hadoop jar \
${HADOO_HOME}/hadoop-mapred-examples-0.22.0-SNAPSHOT.jar randomtextwriter \
-D mapreduce.randomtextwriter.totalbytes= 3221225472 \
-D mapreduce.randomtextwriter.bytespermap=$(( 3221225472/10)) \
-D mapreduce.randomtextwriter.minwordskey=1 \
-D mapreduce.randomtextwriter.maxwordskey=10 \
-D mapreduce.randomtextwriter.minwordsvalue=0 \
-D mapreduce.randomtextwriter.maxwordsvalue=50 \
-D mapred.output.compress=false \
-D mapreduce.jobtracker.maxmapmemory.mb=1024 \
-D mapreduce.jobtracker.maxreducememory.mb=1024 \
-D mapreduce.cluster.mapmemory.mb=800 \
-D mapreduce.cluster.reducememory.mb=800 \
-D mapreduce.map.memory.mb=2048 \
-D mapreduce.reduce.memory.mb=2048 \
-outFormat org.apache.hadoop.mapreduce.lib.output.TextOutputFormat \
highramjob_unsort_input
2. Run the below command for submitting the streaming job.
$HADOOP_HOME/bin/hadoop jar
${HADOOP_HOME}/contrib/streaming/hadoop-0.22.0-SNAPSHOT-streaming.jar \
-D mapreduce.jobtracker.maxmapmemory.mb=1024 \
-D mapreduce.jobtracker.maxreducememory.mb=1024 \
-D mapreduce.cluster.mapmemory.mb=800 \
-D mapreduce.cluster.reducememory.mb=800 \
-D mapreduce.map.memory.mb=2048 \
-D mapreduce.reduce.memory.mb=2048 \
-D mapreduce.job.name="StreamingWordCount" \
-input highramjob_unsort_input \
-output highramjob_output1 \
-mapper cat \
-reducer wc
I have using the 10 node security cluster with trunk 0.22 branch.
Error details:
==========
2010-12-07 06:32:39,963 WARN org.apache.hadoop.mapred.Child: Exception running
child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
at org.apache.hadoop.mapred.Child$4.run(Child.java:223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:217)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
at
org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
at
org.apache.hadoop.mapreduce.task.reduce.MapOutput.<init>(MapOutput.java:104)
at
org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)
at
org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:257)
at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:305)
at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:251)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.