[
https://issues.apache.org/jira/browse/HADOOP-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463379
]
Albert Strasheim commented on HADOOP-817:
-----------------------------------------
I'm running 0.10.0 which should include the patch from HADOOP-849 (as far as I
can tell), but I'm still running into OutOfMemoryErrors.
I'm following the example from this blog entry:
http://jjinux.blogspot.com/2007/01/clustering-hadoop.html
hadoop-default.xml contains:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/state/partition1/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>dominatrix.local:54310</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>dominatrix.local:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
mapred settings are unchanged.
I'm using an input file generated as follows:
perl -e 'for $i(1..99999999) { print "$i\t\n"; }' > input.txt
This generates a 945 MB input file. I then run:
hadoop-0.10.0/bin/hadoop jar hadoop-0.10.0/contrib/hadoop-streaming.jar -mapper
mapper.py -reducer reducer.py -input input.txt -output out-dir
I am running the job across 21 nodes.
What next? How do I debug this problem further? I'll try Java 6 in the mean
time.
> Streaming reducers throw OutOfMemory for not so large inputs
> ------------------------------------------------------------
>
> Key: HADOOP-817
> URL: https://issues.apache.org/jira/browse/HADOOP-817
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Sanjay Dahiya
> Assigned To: Sanjay Dahiya
> Attachments: NetbeansProfie.png
>
>
> I am seeing OutOfMemoryError for moderate size inputs (~70 text files, 20k
> each ) causing job to fail in streaming. For very small inputs it still
> succeeds. Looking into details.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira