Good job, in MapReduce you can build your own Partitioner. That is code determining which reducer will get which keys.
For simplicity, assume you're running 26 reducers. Your custom Partitioner will make sure the first reducer gets all keys starting with 'a', and so on. Since the keys will be sorted within a single reducer, you can concatenate your 26 output files to get an overall sorted output. Making sense? Kai Am 13.08.2011 um 17:44 schrieb Sean Hogan: > Oh, okay, got it - if there was more than one reducer then there needs to be > a way to guarantee that the overall output from multiple reducers will still > be sorted. > > So I want to look for where the implementation of the shuffle/sort phase is > located. Or find something on how Hadoop implements the MapReduce > sort/shuffle phase. > > Thanks! > > -Sean -- Kai Voigt [email protected]
