On Sep 21, 2008, at 9:08 PM, Sandy wrote:

Just a quick clarification:

The combiner function acts as an optimization between the map and the reduce phases. Is the output of the combiner phase stored in memory before being handed to reduce? Or is it written to disk and subsequently read from disk
by the reduce phase?

The data path doesn't change with a combiner, except that the keys and values are reinstantiated to be given to the combiner. When the map is done, the data is written to disk completely sorted, however, since the sort buffer may fill up before the end of the map, the partial results may be sorted, optionally combined, and written to disk. We call these "spills". If there is more than one spill, the results will be read back by a merge sort and written back in a single file.

-- Owen

Reply via email to