seems you have to insert a tag in the map output tuple which tells where this tuple come from. At reduce side, you write your own sort with the tag involved.
-Gang ----- 原始邮件 ---- 发件人: Teodor Macicas <[email protected]> 收件人: "[email protected]" <[email protected]> 发送日期: 2010/8/24 (周二) 5:21:39 上午 主 题: Hadoop sorting algorithm on equal keys Hello, Let's say that we have two maps outputs which will be sorted before the reducer will start. Doesn't matter what {a,b0,b1,c} mean, but let's assume that b0=b1. Map output1 : a, b0 Map output2: c, b1 In this case we can have 2 different sets of sorted data: 1. {a,b0,b1,c} and 2. {a,b1,b0,c} since b0=b1 . In my particular problem I want to distingush between b0 and b1. Basically, they are numbers but I have extra-info on which my comparison will be made. Now, the question is: how can I change Hadoop default behaviour in order to control the sorting algorithm on equal keys ? Thank you in advance. Best, Teodor
