On Jan 22, 2009, at 7:25 AM, Brian MacKay wrote:
Is there a way to set the order of the keys in reduce as shown
below, no
matter what order the collection in MAP occurs in.
The keys to reduce are *always* sorted. If the default order is not
correct, you can change the compare function.
As Tom points out, the critical thing is making sure that all of the
keys that you need to group together go to the same reduce. So let's
make it a little more concrete and say that you have:
public class TextPair implements Writable {
public TextPair() {}
public void set(String left, String right);
public String getLeft();
...
}
And your map 0 does:
key.set("CAT", "B");
output.collect(key, value);
key.set("DOG", "A");
output.collect(key, value);
While map 1 does:
key.set("CAT", "A");
output.collect(key, value);
key.set("DOG","B");
output.collect(key,value);
And you want to make sure that all of the cats go to the same reduces
and that the dogs go to the same reduce, you would need to set the
partitioner. It would look like:
public class MyPartitioner<V> implements Partitioner<TextPair, V> {
public void configure(JobConf job) {}
public int getPartition(TextPair key, V value,
int numReduceTasks) {
return (key.getLeft().hashCode() & Integer.MAX_VALUE) %
numReduceTasks;
}
}
Then define a raw comparator that sorts based on both the left and
right part of the TextPair, and you are set.
-- Owen