Re: Join Hadoop Example problem

abhishek sharma Mon, 25 Jan 2010 18:01:47 -0800

What is the exact command that you are giving when submitting the
jobs? I did not see it in your e-mail.


Abhishek

On Mon, Jan 25, 2010 at 5:43 PM, Rob Stewart
<[email protected]> wrote:
> Hi there, I'm using Hadoop 0.20.1 and I'm trying to use the Join application
> within the hadoop-*examples.jar . I can't seem to figure it out, where am I
> going wrong? It isn't grouping the keys together, as I would expect....
> ------------------------
>> bin/hadoop dfs -cat join/a.txt
> AAAAAAAA,a0
> BBBBBBBB,a1
> CCCCCCCC,a2
> CCCCCCCC,a3
>
>> bin/hadoop dfs -cat join/b.txt
> AAAAAAAA,b0
> BBBBBBBB,b1
> BBBBBBBB,b2
> BBBBBBBB,b3
>
>> bin/hadoop dfs -cat join/c.txt
> AAAAAAAA,c0
> BBBBBBBB,c1
> DDDDDDDD,c2
> DDDDDDDD,c3
>
>>
>
> -----*RESULT*-----
>>bin/hadoop dfs -text theOutputs/part-00000
> AAAAAAAA        [a0]
> AAAAAAAA        [b0]
> AAAAAAAA        [c0]
> BBBBBBBB        [c1]
> BBBBBBBB        [a1]
> BBBBBBBB        [b1]
> BBBBBBBB        [b2]
> BBBBBBBB        [b3]
> CCCCCCCC        [a2]
> CCCCCCCC        [a3]
> DDDDDDDD        [c2]
> DDDDDDDD        [c3]
> -----------------------
>
>
> So, why has it not grouped all the AAAAAAAA's etc so that it, instead looks
> like this:
>
> AAAAAAAA        [a0,b0,c0]
> BBBBBBBB        [a1,b1,c1]
> BBBBBBBB        [a1,b2,c1]
> BBBBBBBB        [a1,b3,c1]
> CCCCCCCC        [a2,,]
> CCCCCCCC        [a3,,]
> DDDDDDDD        [,,c2]
> DDDDDDDD        [,,c3]
>
> ?
>
> ---------------------
>
> I have another question. Instead of these Key/Value pairs, what if I
> have two input files list1.txt and list2.txt, both containing a list
> of names, one line per name. I want to JOIN these input files BY the
> names in each list. i.e. I want to create an output file containing a
> list of the names that appear in both the input lists. Is it possible
> to adapt the Join example packaged with Hadoop to implement this?
>
>
> Many thanks,
>
> Rob Stewart
>

Re: Join Hadoop Example problem

Reply via email to