So do you mean that it is concurrently spilling for checkpoint and being ready for reduce job to fetch output?
On Tue, Jul 14, 2009 at 9:40 AM, Dali Kilani <[email protected]> wrote: > If I am not mistaken (I am new to this stuff), that's because you need to > have a checkpoint from which you can restart the reduce jobs that use those > spilled records in case of a reduce task failure. > > Dali > On Mon, Jul 13, 2009 at 6:32 PM, Mu Qiao <[email protected]> wrote: > > > Thank you. But why need map outputs to be written to disk at least once? > I > > think my io.sort.mb is large enough to do in-memory operations. Could you > > provide me some information about it? > > > > On Tue, Jul 14, 2009 at 1:27 AM, Owen O'Malley <[email protected]> > wrote: > > > > > > > > On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote: > > > > > > I notice it from the web console after I've tried to run serveral > jobs. > > >> Every one of the jobs has the number of Spilled Records equal to Map > > >> output > > >> records, even if there are only 5 map output records > > >> > > > > > > > > > This is good. The map outputs need to be written to disk at least once. > > So > > > if they are equal, things are fitting in memory. If multiple passes are > > > needed, you'll see 2x or more spilled records. > > > > > > In the reduce phase, there are also spilled records which is equal to > > >> reduce > > >> input records. > > >> > > > > > > This is reasonable, although 0.19 and 0.20 don't need to spill the > > records > > > in the reduce at all, if you make the buffer big enough. > > > > > > -- Owen > > > > > > > > > > > -- > > Best wishes, > > Qiao Mu > > > > > > -- > Dali Kilani > =========== > Phone : (650) 492-5921 (Google Voice) > E-Fax : (775) 552-2982 > -- Best wishes, Qiao Mu
