mapper should produce (k,1), (1, v) for lines k,v in file1 and should produce (k,2), (2,v) for lines k,v in file2. Your partition function should look at only the first member of the key tuple, but should order on both members.
Your reducer will get data like this: (k,1), [(1,v)] or like this (k, 1), [(1,v1),(2,v2)] In the first case, it should emit k, v. In the second, k,v2. More simply, it should simply emit the last value in the reduce group. In actual practice, you should probably use something fancier than an integer to tag the data. You will also have to find some kind of appropriate tuple structure. Pig, Cascading, Plume and Hive would make this easier than straight Java, but all techniques would work. On Tue, Feb 8, 2011 at 4:26 PM, Gururaj S Mayya <[email protected]> wrote: > Any pointers as to how this could be done? >
