Re: Hadoop sorting algorithm on equal keys

Owen O'Malley Tue, 24 Aug 2010 09:18:41 -0700


On Aug 24, 2010, at 2:21 AM, Teodor Macicas wrote:

Hello,
Let's say that we have two maps outputs which will be sorted beforethe reducer will start. Doesn't matter what {a,b0,b1,c} mean, butlet's assume that b0=b1.
Map output1 : a, b0
Map output2:  c, b1
In this case we can have 2 different sets of sorted data:
1. {a,b0,b1,c}  and
2. {a,b1,b0,c}  since b0=b1 .
In my particular problem I want to distingush between b0 and b1.Basically, they are numbers but I have extra-info on which mycomparison will be made.Now, the question is: how can I change Hadoop default behaviour inorder to control the sorting algorithm on equal keys ?

You need to extend the keys with the extra information to sort on. Toget exactly one call to reduce for each logical key, you define agrouping comparator that determines when two keys should be distinctcalls to reduce. Look at the SecondarySort example in MapReduce. http://bit.ly/a9B7hh


-- Owen

Re: Hadoop sorting algorithm on equal keys

Reply via email to