Hi, Mike, Nitin, Devaraj, Soumya, samir, Robert
Thank you all for your suggestions. Actually, I want to know if hadoop has any advantage than routine database in performance for solving this kind of problem ( join data ). Best Regards, Gump On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee <soumya.sbaner...@gmail.com> wrote: Hi, You can also try to use the Hadoop Reduce Side Join functionality. Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and Reduce classes to do the same. Regards, Soumya. On Tue, May 29, 2012 at 4:10 PM, Devaraj k <devara...@huawei.com> wrote: > Hi Gump, > > Mapreduce fits well for solving these types(joins) of problem. > > I hope this will help you to solve the described problem.. > > 1. Mapoutput key and value classes : Write a map out put key > class(Text.class), value class(CombinedValue.class). Here value class > should be able to hold the values from both the files(a.txt and b.txt) as > shown below. > > class CombinedValue implements WritableComparator > { > String name; > int age; > String address; > boolean isLeft; // flag to identify from which file > } > > 2. Mapper : Write a map() function which can parse from both the > files(a.txt, b.txt) and produces common output key and value class. > > 3. Partitioner : Write the partitioner in such a way that it will Send all > the (key, value) pairs to same reducer which are having same key. > > 4. Reducer : In the reduce() function, you will receive the records from > both the files and you can combine those easily. > > > Thanks > Devaraj > > > ________________________________________ > From: liuzhg [liu...@cernet.com] > Sent: Tuesday, May 29, 2012 3:45 PM > To: common-user@hadoop.apache.org > Subject: How to mapreduce in the scenario > > Hi, > > I wonder that if Hadoop can solve effectively the question as following: > > ========================================== > input file: a.txt, b.txt > result: c.txt > > a.txt: > id1,name1,age1,... > id2,name2,age2,... > id3,name3,age3,... > id4,name4,age4,... > > b.txt: > id1,address1,... > id2,address2,... > id3,address3,... > > c.txt > id1,name1,age1,address1,... > id2,name2,age2,address2,... > ======================================== > > I know that it can be done well by database. > But I want to handle it with hadoop if possible. > Can hadoop meet the requirement? > > Any suggestion can help me. Thank you very much! > > Best Regards, > > Gump >