On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote:

Jim,

As far as I know, there is no operation done after Reducer.

Correct, other than output promotion, which moves the output file to the final filename.

But if you  are a little experienced, you already know these.
Ordered list means one final file, or am I missing something?

There is no value and a lot of cost associated with creating a single file for the output. The question is how you want the keys divided between the reduces (and therefore output files). The default partitioner hashes the key and mods by the number of reduces, which "stripes" the keys across the output files. You can use the mapred.lib.InputSampler to generate good partition keys and mapred.lib.TotalOrderPartitioner to get completely sorted output based on the partition keys. With the total order partitioner, each reduce gets an increasing range of keys and thus has all of the nice properties of a single file without the costs.

-- Owen

Reply via email to