Hi Gump, Mapreduce fits well for solving these types(joins) of problem.
I hope this will help you to solve the described problem.. 1. Mapoutput key and value classes : Write a map out put key class(Text.class), value class(CombinedValue.class). Here value class should be able to hold the values from both the files(a.txt and b.txt) as shown below. class CombinedValue implements WritableComparator { String name; int age; String address; boolean isLeft; // flag to identify from which file } 2. Mapper : Write a map() function which can parse from both the files(a.txt, b.txt) and produces common output key and value class. 3. Partitioner : Write the partitioner in such a way that it will Send all the (key, value) pairs to same reducer which are having same key. 4. Reducer : In the reduce() function, you will receive the records from both the files and you can combine those easily. Thanks Devaraj ________________________________________ From: liuzhg [liu...@cernet.com] Sent: Tuesday, May 29, 2012 3:45 PM To: common-user@hadoop.apache.org Subject: How to mapreduce in the scenario Hi, I wonder that if Hadoop can solve effectively the question as following: ========================================== input file: a.txt, b.txt result: c.txt a.txt: id1,name1,age1,... id2,name2,age2,... id3,name3,age3,... id4,name4,age4,... b.txt: id1,address1,... id2,address2,... id3,address3,... c.txt id1,name1,age1,address1,... id2,name2,age2,address2,... ======================================== I know that it can be done well by database. But I want to handle it with hadoop if possible. Can hadoop meet the requirement? Any suggestion can help me. Thank you very much! Best Regards, Gump