Thank you. But why need map outputs to be written to disk at least once? I think my io.sort.mb is large enough to do in-memory operations. Could you provide me some information about it?
On Tue, Jul 14, 2009 at 1:27 AM, Owen O'Malley <[email protected]> wrote: > > On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote: > > I notice it from the web console after I've tried to run serveral jobs. >> Every one of the jobs has the number of Spilled Records equal to Map >> output >> records, even if there are only 5 map output records >> > > > This is good. The map outputs need to be written to disk at least once. So > if they are equal, things are fitting in memory. If multiple passes are > needed, you'll see 2x or more spilled records. > > In the reduce phase, there are also spilled records which is equal to >> reduce >> input records. >> > > This is reasonable, although 0.19 and 0.20 don't need to spill the records > in the reduce at all, if you make the buffer big enough. > > -- Owen > -- Best wishes, Qiao Mu
