if you have huge dataset (huge meaning that around tera bytes or at the least few GBs) then yes, hadoop has the advantage of distributed systems and is much faster
but on a smaller set of records it is not as good as RDBMS On Wed, May 30, 2012 at 6:53 AM, liuzhg <liu...@cernet.com> wrote: > Hi, > > Mike, Nitin, Devaraj, Soumya, samir, Robert > > Thank you all for your suggestions. > > Actually, I want to know if hadoop has any advantage than routine database > in performance for solving this kind of problem ( join data ). > > > > Best Regards, > > Gump > > > > > > On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee > <soumya.sbaner...@gmail.com> wrote: > > Hi, > > You can also try to use the Hadoop Reduce Side Join functionality. > Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and > Reduce classes to do the same. > > Regards, > Soumya. > > > On Tue, May 29, 2012 at 4:10 PM, Devaraj k <devara...@huawei.com> wrote: > > > Hi Gump, > > > > Mapreduce fits well for solving these types(joins) of problem. > > > > I hope this will help you to solve the described problem.. > > > > 1. Mapoutput key and value classes : Write a map out put key > > class(Text.class), value class(CombinedValue.class). Here value class > > should be able to hold the values from both the files(a.txt and b.txt) as > > shown below. > > > > class CombinedValue implements WritableComparator > > { > > String name; > > int age; > > String address; > > boolean isLeft; // flag to identify from which file > > } > > > > 2. Mapper : Write a map() function which can parse from both the > > files(a.txt, b.txt) and produces common output key and value class. > > > > 3. Partitioner : Write the partitioner in such a way that it will Send > all > > the (key, value) pairs to same reducer which are having same key. > > > > 4. Reducer : In the reduce() function, you will receive the records from > > both the files and you can combine those easily. > > > > > > Thanks > > Devaraj > > > > > > ________________________________________ > > From: liuzhg [liu...@cernet.com] > > Sent: Tuesday, May 29, 2012 3:45 PM > > To: common-user@hadoop.apache.org > > Subject: How to mapreduce in the scenario > > > > Hi, > > > > I wonder that if Hadoop can solve effectively the question as following: > > > > ========================================== > > input file: a.txt, b.txt > > result: c.txt > > > > a.txt: > > id1,name1,age1,... > > id2,name2,age2,... > > id3,name3,age3,... > > id4,name4,age4,... > > > > b.txt: > > id1,address1,... > > id2,address2,... > > id3,address3,... > > > > c.txt > > id1,name1,age1,address1,... > > id2,name2,age2,address2,... > > ======================================== > > > > I know that it can be done well by database. > > But I want to handle it with hadoop if possible. > > Can hadoop meet the requirement? > > > > Any suggestion can help me. Thank you very much! > > > > Best Regards, > > > > Gump > > > > > > -- Nitin Pawar