On Sat, Sep 10, 2011 at 12:33 PM, Meng Mao <[email protected]> wrote:
> Is there a way to collate the possibly large number of map output files, > though? You can make fewer mappers by setting the mapred.min.split.size to define the smallest input that will be given to a mapper. There isn't currently a way of getting a collated, but unsorted list of key/value pairs. For most applications, the in memory sort is fairly cheap relative to the shuffle and other parts of the processing. -- Owen
