Re: How to mapreduce in the scenario

liuzhg Tue, 29 May 2012 18:21:05 -0700

Hi,

Mike, Nitin, Devaraj, Soumya, samir, Robert


Thank you all for your suggestions.

Actually, I want to know if hadoop has any advantage than routine database
in performance for solving this kind of problem ( join data ). 

 

Best Regards,

Gump

 

 

On Tue, May 29, 2012 at 6:53 PM, Soumya Banerjee
<[email protected]> wrote:

Hi,

You can also try to use the Hadoop Reduce Side Join functionality.
Look into the contrib/datajoin/hadoop-datajoin-*.jar for the base MAP and
Reduce classes to do the same.

Regards,
Soumya.


On Tue, May 29, 2012 at 4:10 PM, Devaraj k <[email protected]> wrote:

> Hi Gump,
>
>   Mapreduce fits well for solving these types(joins) of problem.
>
> I hope this will help you to solve the described problem..
>
> 1. Mapoutput key and value classes : Write a map out put key
> class(Text.class), value class(CombinedValue.class). Here value class
> should be able to hold the values from both the files(a.txt and b.txt) as
> shown below.
>
> class CombinedValue implements WritableComparator
> {
>   String name;
>   int age;
>   String address;
>   boolean isLeft; // flag to identify from which file
> }
>
> 2. Mapper : Write a map() function which can parse from both the
> files(a.txt, b.txt) and produces common output key and value class.
>
> 3. Partitioner : Write the partitioner in such a way that it will Send all
> the (key, value) pairs to same reducer which are having same key.
>
> 4. Reducer : In the reduce() function, you will receive the records from
> both the files and you can combine those easily.
>
>
> Thanks
> Devaraj
>
>
> ________________________________________
> From: liuzhg [[email protected]]
> Sent: Tuesday, May 29, 2012 3:45 PM
> To: [email protected]
> Subject: How to mapreduce in the scenario
>
> Hi,
>
> I wonder that if Hadoop can solve effectively the question as following:
>
> ==========================================
> input file: a.txt, b.txt
> result: c.txt
>
> a.txt:
> id1,name1,age1,...
> id2,name2,age2,...
> id3,name3,age3,...
> id4,name4,age4,...
>
> b.txt：
> id1,address1,...
> id2,address2,...
> id3,address3,...
>
> c.txt
> id1,name1,age1,address1,...
> id2,name2,age2,address2,...
> ========================================
>
> I know that it can be done well by database.
> But I want to handle it with hadoop if possible.
> Can hadoop meet the requirement?
>
> Any suggestion can help me. Thank you very much!
>
> Best Regards,
>
> Gump
>

Re: How to mapreduce in the scenario

Reply via email to