[ https://issues.apache.org/jira/browse/HADOOP-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463379 ]
Albert Strasheim commented on HADOOP-817: ----------------------------------------- I'm running 0.10.0 which should include the patch from HADOOP-849 (as far as I can tell), but I'm still running into OutOfMemoryErrors. I'm following the example from this blog entry: http://jjinux.blogspot.com/2007/01/clustering-hadoop.html hadoop-default.xml contains: <configuration> <property> <name>hadoop.tmp.dir</name> <value>/state/partition1/tmp/hadoop-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>dominatrix.local:54310</value> </property> <property> <name>mapred.job.tracker</name> <value>dominatrix.local:54311</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> mapred settings are unchanged. I'm using an input file generated as follows: perl -e 'for $i(1..99999999) { print "$i\t\n"; }' > input.txt This generates a 945 MB input file. I then run: hadoop-0.10.0/bin/hadoop jar hadoop-0.10.0/contrib/hadoop-streaming.jar -mapper mapper.py -reducer reducer.py -input input.txt -output out-dir I am running the job across 21 nodes. What next? How do I debug this problem further? I'll try Java 6 in the mean time. > Streaming reducers throw OutOfMemory for not so large inputs > ------------------------------------------------------------ > > Key: HADOOP-817 > URL: https://issues.apache.org/jira/browse/HADOOP-817 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming > Reporter: Sanjay Dahiya > Assigned To: Sanjay Dahiya > Attachments: NetbeansProfie.png > > > I am seeing OutOfMemoryError for moderate size inputs (~70 text files, 20k > each ) causing job to fail in streaming. For very small inputs it still > succeeds. Looking into details. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira