Have you done any testing to confirm that the order of the output keys is actually changed?
Merge-sort on its own is a 'stable' algorithm, and so the order should not change unless different variations on sorting are used (in memory before spilling to disk, for instance). Thanks, Stu -----Original Message----- From: Ted Dunning <[EMAIL PROTECTED]> Sent: Monday, October 1, 2007 10:32pm To: [email protected] Subject: Re: computing conditional probabilities with Hadoop? Actually, it would be almost as useful to be able to have a "multi-reduce". In such a system, you would specify multiple input/map pairs. The reduce function signature would then be something like: reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...) Where the output of each set of maps would be given its own iterator. I didn't mention this alternative earlier because I figured it would be a much bigger leap than just ordering the reduce values. It would, however, be very useful when it comes to co-grouping operations. On 10/1/07 6:17 PM, "Ted Dunning" wrote: > > This is a common requirement. > > Left unchanged would be fine but is probably very hard to enforce because of > the many map tasks and some uncertainty about which maps finished first. > Similarly useful would be the ability to require a particular sort ordering > on reduce values. > > > On 10/1/07 6:05 PM, "Chris Dyer" wrote: > >> Does anyone know if Hadoop guarantees (can be made to guarantee) that the >> relative order of keys that are equal will be left unchanged? >
