On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote:

I notice it from the web console after I've tried to run serveral jobs. Every one of the jobs has the number of Spilled Records equal to Map output
records, even if there are only 5 map output records


This is good. The map outputs need to be written to disk at least once. So if they are equal, things are fitting in memory. If multiple passes are needed, you'll see 2x or more spilled records.

In the reduce phase, there are also spilled records which is equal to reduce
input records.

This is reasonable, although 0.19 and 0.20 don't need to spill the records in the reduce at all, if you make the buffer big enough.

-- Owen

Reply via email to