Have you done any testing to confirm that the order of the output keys is 
actually changed?

Merge-sort on its own is a 'stable' algorithm, and so the order should not 
change unless different variations on sorting are used (in memory before 
spilling to disk, for instance).

Thanks,
Stu


-----Original Message-----
From: Ted Dunning <[EMAIL PROTECTED]>
Sent: Monday, October 1, 2007 10:32pm
To: [email protected]
Subject: Re: computing conditional probabilities with Hadoop?



Actually, it would be almost as useful to be able to have a "multi-reduce".

In such a system, you would specify multiple input/map pairs.  The reduce
function signature would then be something like:

    reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...)

Where the output of each set of maps would be given its own iterator.

I didn't mention this alternative earlier because I figured it would be a
much bigger leap than just ordering the reduce values.  It would, however,
be very useful when it comes to co-grouping operations.


On 10/1/07 6:17 PM, "Ted Dunning"  wrote:

> 
> This is a common requirement.
> 
> Left unchanged would be fine but is probably very hard to enforce because of
> the many map tasks and some uncertainty about which maps finished first.
> Similarly useful would be the ability to require a particular sort ordering
> on reduce values.
> 
> 
> On 10/1/07 6:05 PM, "Chris Dyer"  wrote:
> 
>> Does anyone know if Hadoop guarantees (can be made to guarantee) that the
>> relative order of keys that are equal will be left unchanged?
> 

Reply via email to