I actually solved the problem by increasing a parameter in hadoop-site.xml, since the default wasn't sufficient:
<property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property> Thanks, Ryan On Sun, Sep 21, 2008 at 12:59 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Yes I did, but that didn't solve my problem since I'm working with a fairly > large data set (8gb). > > Thanks, > Ryan > > > > > On Sep 21, 2008, at 12:22 AM, Sandy <[EMAIL PROTECTED]> wrote: > >> Have you increased the heapsize in conf/hadoop-env.sh to 2000? This helped >> me some, but eventually I had to upgrade to a system with more memory. >> >> -SM >> >> >> On Sat, Sep 20, 2008 at 9:07 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: >> >>> Hello all, >>> >>> I'm setting up a small 3 node hadoop cluster (1 node for >>> namenode/jobtracker and the other two for datanode/tasktracker). The >>> map tasks finish fine, but the reduce tasks are failing at about 30% >>> with an out of memory error. My guess is because the amount of data >>> that I'm crunching through just won't be able to fit in memory during >>> the reduce tasks on two machines (max of 2 reduce tasks on each >>> machine). Is this expected? If I had a large hadoop cluster, then I >>> could increase the number of reduce tasks on each machine of the >>> cluster so that not all of the data to be processed is occurring in >>> just 4 JVMs on two machines like I currently have setup, correct? Is >>> there any way to get the reduce task to not try and hold all of the >>> data in memory, or is my only option to add more nodes to the cluster >>> to therefore increase the number of reduce tasks? >>> >>> Thanks! >>> >>> Ryan >>> >
