Re: Why is Spilled Records always equal to Map output records

Mu Qiao Mon, 13 Jul 2009 18:50:06 -0700

So do you mean that it is concurrently spilling for checkpoint and being
ready for reduce job to fetch output?


On Tue, Jul 14, 2009 at 9:40 AM, Dali Kilani <[email protected]> wrote:

> If I am not mistaken (I am new to this stuff), that's because you need to
> have a checkpoint from which you can restart the reduce jobs that use those
> spilled records in case of a reduce task failure.
>
> Dali
> On Mon, Jul 13, 2009 at 6:32 PM, Mu Qiao <[email protected]> wrote:
>
> > Thank you. But why need map outputs to be written to disk at least once?
> I
> > think my io.sort.mb is large enough to do in-memory operations. Could you
> > provide me some information about it?
> >
> > On Tue, Jul 14, 2009 at 1:27 AM, Owen O'Malley <[email protected]>
> wrote:
> >
> > >
> > > On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote:
> > >
> > >  I notice it from the web console after I've tried to run serveral
> jobs.
> > >> Every one of the jobs has the number of Spilled Records equal to Map
> > >> output
> > >> records, even if there are only 5 map output records
> > >>
> > >
> > >
> > > This is good. The map outputs need to be written to disk at least once.
> > So
> > > if they are equal, things are fitting in memory. If multiple passes are
> > > needed, you'll see 2x or more spilled records.
> > >
> > >  In the reduce phase, there are also spilled records which is equal to
> > >> reduce
> > >> input records.
> > >>
> > >
> > > This is reasonable, although 0.19 and 0.20 don't need to spill the
> > records
> > > in the reduce at all, if you make the buffer big enough.
> > >
> > > -- Owen
> > >
> >
> >
> >
> > --
> > Best wishes,
> > Qiao Mu
> >
>
>
>
> --
> Dali Kilani
> ===========
> Phone :  (650) 492-5921 (Google Voice)
> E-Fax  :  (775) 552-2982
>



-- 
Best wishes,
Qiao Mu

Re: Why is Spilled Records always equal to Map output records

Reply via email to