Ah yes, It looks like both the mapper and reducer are using a map structure which will be created on the heap. All the values from the reducer are being inserted into the map structure. If you have lots of values for a single key then you're going to run out of heap memory really fast. Do you have a rough estimate for the number of values per key? We had this problem when we first started using map-reduce (we'd create large arrays in the reducer to hold data to sort). Turns out this is generally a very bad idea (it's particularly bad when the number of values per key is not bounded since sometimes you're algorithm will work and other times you'll get out of memory errors). In our case we redesigned our algorithm to not require holding lots of values in memory by taking advantage of Hadoop's sorting capability and secondary sorting capability.
My guess is you won't be able to use the cloud9 mapper and reducer unless your data changes so that the number of unique values per key is much lower. It's also possible that you're running out of heap space in the mapper as your create the map there. How many items are in the terms array? I String[] terms = text.split("\\s+"); Sorry that's probably not much help to you. ~Ed On Wed, Oct 6, 2010 at 8:04 AM, Pramy Bhats <pramybh...@googlemail.com>wrote: > Hi Ed, > I was using the following file for mapreduce job. > > Cloud9/src/dist/edu/umd/cloud9/example/cooccur/ComputeCooccurrenceMatrixStripes.java > thanks, > --Pramod > > On Tue, Oct 5, 2010 at 10:51 PM, ed <hadoopn...@gmail.com> wrote: > > > What are the exact files you are using for the mapper and reducer from > the > > cloud9 package? > > > > On Tue, Oct 5, 2010 at 2:15 PM, Pramy Bhats <pramybh...@googlemail.com > > >wrote: > > > > > Hi Ed, > > > > > > I was trying to benchmark some application code available online. > > > http://github.com/lintool/Cloud9 > > > > > > For the program computing concurrentmatrix strips. However, the code > > itself > > > is problematic because it throws heap-space error for even very small > > data > > > sets. > > > > > > thanks, > > > --Pramod > > > > > > > > > > > > On Tue, Oct 5, 2010 at 5:50 PM, ed <hadoopn...@gmail.com> wrote: > > > > > > > Hi Pramod, > > > > > > > > How much memory does each node in your cluster have? > > > > > > > > What type of processors do those nodes have? (dual core, quad core, > > dual > > > > quad core? etc..) > > > > > > > > In what step are you seeing the heap space error (mapper or reducer?) > > > > > > > > It's quite possible that you're mapper or reducer code could be > > improved > > > to > > > > reduce heap space usage. > > > > > > > > ~Ed > > > > > > > > On Tue, Oct 5, 2010 at 10:05 AM, Marcos Medrado Rubinelli < > > > > marc...@buscape-inc.com> wrote: > > > > > > > > > You can set the mapred.tasktracker.map.tasks.maximum and > > > > > mapred.tasktracker.reduce.tasks.maximum properties in your > > > > mapred-site.xml > > > > > file, but you may also want to check your current > > > mapred.child.java.opts > > > > and > > > > > mapred.child.ulimit values to make sure they aren't overriding the > > 4GB > > > > you > > > > > set globally. > > > > > > > > > > Cheers, > > > > > Marcos > > > > > > > > > > Hi, > > > > >> > > > > >> I am trying to run a job on my hadoop cluster, where I get > > > consistently > > > > >> get > > > > >> heap space error. > > > > >> > > > > >> I increased the heap-space to 4 GB in hadoop-env.sh and reboot the > > > > >> cluster. > > > > >> However, I still get the heap space error. > > > > >> > > > > >> > > > > >> One of things, I want to try is to reduce the number of map / > reduce > > > > >> process > > > > >> per machine. Currently each machine can have 2 maps and 2 reduce > > > process > > > > >> running. > > > > >> > > > > >> > > > > >> I want to configure the hadoop to run 1 map and 1 reduce per > machine > > > to > > > > >> give > > > > >> more heap space per process. > > > > >> > > > > >> How can I configure the number of maps and number of reducer per > > node > > > ? > > > > >> > > > > >> > > > > >> thanks in advance, > > > > >> -- Pramod > > > > >> > > > > >> > > > > > > > > > > > > > > >