Re: Running Map and Reduce Sequentially

Kris Jirapinyo Fri, 13 Feb 2009 16:49:03 -0800

I can't afford to have only one reducer as my dataset is huge...right now it
is 50GB and so the output.collect() in the reducer will surely run out of
java heap space.


2009/2/13 Amandeep Khurana <ama...@gmail.com>

> Have only one instance of the reduce task. This will run once your map
> tasks
> are completed. You can set this in your job conf by using
> conf.setNumReducers(1)
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>
>
> > What do you mean when I have only 1 reducer?
> >
> > On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS <rasitoz...@gmail.com>
> wrote:
> >
> > > Kris,
> > > This is the case when you have only 1 reducer.
> > > If it doesn't have any side effects for you..
> > >
> > > Rasit
> > >
> > >
> > > 2009/2/14 Kris Jirapinyo <kjirapi...@biz360.com>:
> > > > Is there a way to tell Hadoop to not run Map and Reduce concurrently?
> > >  I'm
> > > > running into a problem where I set the jvm to Xmx768 and it seems
> like
> > 2
> > > > mappers and 2 reducers are running on each machine that only has
> 1.7GB
> > of
> > > > ram, so it complains of not being able to allocate memory...(which
> > makes
> > > > sense since 4x768mb > 1.7GB).  So, if it would just finish the Map
> and
> > > then
> > > > start on Reduce, then there would be 2 jvm's running on one machine
> at
> > > any
> > > > given time and thus possibly avoid this out of memory error.
> > > >
> > >
> > >
> > >
> > > --
> > > M. Raşit ÖZDAŞ
> > >
> >
>

Re: Running Map and Reduce Sequentially

Reply via email to