On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote:
I notice it from the web console after I've tried to run serveral
jobs.
Every one of the jobs has the number of Spilled Records equal to Map
output
records, even if there are only 5 map output records
This is good. The map outputs need to be written to disk at least
once. So if they are equal, things are fitting in memory. If multiple
passes are needed, you'll see 2x or more spilled records.
In the reduce phase, there are also spilled records which is equal
to reduce
input records.
This is reasonable, although 0.19 and 0.20 don't need to spill the
records in the reduce at all, if you make the buffer big enough.
-- Owen