from what i understand, the combiner runs when nodes are idle and you're waiting on a few processes that are taking too long... so the cluster tries to optimize by putting these idle nodes to work by doing optional preprocessing...
On Fri, Dec 4, 2009 at 2:02 PM, Raymond Jennings III <[email protected]> wrote: > I still would like to know how many times it will run given how many mappers > run. I realize it may never run but what determines how many times if any? > > --- On Fri, 12/4/09, Mike Kendall <[email protected]> wrote: > >> From: Mike Kendall <[email protected]> >> Subject: Re: Combiner phase question >> To: [email protected] >> Date: Friday, December 4, 2009, 4:59 PM >> are you sure it can be run in the >> reduce task? if it does it's still >> before the reducer is called though... so the flow of >> your data will >> still be: data -> mapper(s) -> optional reducer(s) >> -> reducer(s) -> >> output_data >> >> >> >> On Fri, Dec 4, 2009 at 1:42 PM, Owen O'Malley <[email protected]> >> wrote: >> > On Fri, Dec 4, 2009 at 12:32 PM, Raymond Jennings III >> <[email protected] >> >> wrote: >> > >> >> Does the combiner run once per data node or one >> per map task? (That it can >> >> run multiple times on the same data node after >> each map task.) Thanks. >> >> >> > >> > The combiner can run 0, 1, or many times on each data >> value. It can run in >> > both the map task and reduce task. >> > >> > -- Owen >> > >> > > > >
