On Aug 24, 2010, at 2:21 AM, Teodor Macicas wrote:

Hello,

Let's say that we have two maps outputs which will be sorted before the reducer will start. Doesn't matter what {a,b0,b1,c} mean, but let's assume that b0=b1.
Map output1 : a, b0
Map output2:  c, b1
In this case we can have 2 different sets of sorted data:
1. {a,b0,b1,c}  and
2. {a,b1,b0,c}  since b0=b1 .

In my particular problem I want to distingush between b0 and b1. Basically, they are numbers but I have extra-info on which my comparison will be made. Now, the question is: how can I change Hadoop default behaviour in order to control the sorting algorithm on equal keys ?

You need to extend the keys with the extra information to sort on. To get exactly one call to reduce for each logical key, you define a grouping comparator that determines when two keys should be distinct calls to reduce. Look at the SecondarySort example in MapReduce. http://bit.ly/a9B7hh

-- Owen

Reply via email to