Yes . Hadoop Is only for Huge Dataset Computaion . May not good for small dataset.
On Wed, May 30, 2012 at 6:53 AM, liuzhg <liu...@cernet.com> wrote: > Hi, > > Mike, Nitin, Devaraj, Soumya, samir, Robert > > Thank you all for your suggestions. > > Actually, I want to know if hadoop has any advantage than routine database > in performance for solving this kind of problem ( join data ). > > > > Best Regards, > > Gump > > > > > > On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee > <soumya.sbaner...@gmail.com> wrote: > > Hi, > > You can also try to use the Hadoop Reduce Side Join functionality. > Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and > Reduce classes to do the same. > > Regards, > Soumya. > > > On Tue, May 29, 2012 at 4:10 PM, Devaraj k <devara...@huawei.com> wrote: > > > Hi Gump, > > > > Mapreduce fits well for solving these types(joins) of problem. > > > > I hope this will help you to solve the described problem.. > > > > 1. Mapoutput key and value classes : Write a map out put key > > class(Text.class), value class(CombinedValue.class). Here value class > > should be able to hold the values from both the files(a.txt and b.txt) as > > shown below. > > > > class CombinedValue implements WritableComparator > > { > > String name; > > int age; > > String address; > > boolean isLeft; // flag to identify from which file > > } > > > > 2. Mapper : Write a map() function which can parse from both the > > files(a.txt, b.txt) and produces common output key and value class. > > > > 3. Partitioner : Write the partitioner in such a way that it will Send > all > > the (key, value) pairs to same reducer which are having same key. > > > > 4. Reducer : In the reduce() function, you will receive the records from > > both the files and you can combine those easily. > > > > > > Thanks > > Devaraj > > > > > > ________________________________________ > > From: liuzhg [liu...@cernet.com] > > Sent: Tuesday, May 29, 2012 3:45 PM > > To: common-user@hadoop.apache.org > > Subject: How to mapreduce in the scenario > > > > Hi, > > > > I wonder that if Hadoop can solve effectively the question as following: > > > > ========================================== > > input file: a.txt, b.txt > > result: c.txt > > > > a.txt: > > id1,name1,age1,... > > id2,name2,age2,... > > id3,name3,age3,... > > id4,name4,age4,... > > > > b.txt: > > id1,address1,... > > id2,address2,... > > id3,address3,... > > > > c.txt > > id1,name1,age1,address1,... > > id2,name2,age2,address2,... > > ======================================== > > > > I know that it can be done well by database. > > But I want to handle it with hadoop if possible. > > Can hadoop meet the requirement? > > > > Any suggestion can help me. Thank you very much! > > > > Best Regards, > > > > Gump > > > > > >