Re: Join Hadoop Example problem

Rob Stewart Mon, 25 Jan 2010 18:26:09 -0800

Good point, I missed that. It is:

> bin/hadoop jar hadoop-*-examples.jar join -D
key.value.separator.in.input.line=',' -inFormat
org.apache.hadoop.mapred.KeyValueTextInputFormat  -outKey
org.apache.hadoop.io.Text  join/  theOutputs




Rob


2010/1/26 abhishek sharma <[email protected]>

> What is the exact command that you are giving when submitting the
> jobs? I did not see it in your e-mail.
>
> Abhishek
>
> On Mon, Jan 25, 2010 at 5:43 PM, Rob Stewart
> <[email protected]> wrote:
> > Hi there, I'm using Hadoop 0.20.1 and I'm trying to use the Join
> application
> > within the hadoop-*examples.jar . I can't seem to figure it out, where am
> I
> > going wrong? It isn't grouping the keys together, as I would expect....
> > ------------------------
> >> bin/hadoop dfs -cat join/a.txt
> > AAAAAAAA,a0
> > BBBBBBBB,a1
> > CCCCCCCC,a2
> > CCCCCCCC,a3
> >
> >> bin/hadoop dfs -cat join/b.txt
> > AAAAAAAA,b0
> > BBBBBBBB,b1
> > BBBBBBBB,b2
> > BBBBBBBB,b3
> >
> >> bin/hadoop dfs -cat join/c.txt
> > AAAAAAAA,c0
> > BBBBBBBB,c1
> > DDDDDDDD,c2
> > DDDDDDDD,c3
> >
> >>
> >
> > -----*RESULT*-----
> >>bin/hadoop dfs -text theOutputs/part-00000
> > AAAAAAAA        [a0]
> > AAAAAAAA        [b0]
> > AAAAAAAA        [c0]
> > BBBBBBBB        [c1]
> > BBBBBBBB        [a1]
> > BBBBBBBB        [b1]
> > BBBBBBBB        [b2]
> > BBBBBBBB        [b3]
> > CCCCCCCC        [a2]
> > CCCCCCCC        [a3]
> > DDDDDDDD        [c2]
> > DDDDDDDD        [c3]
> > -----------------------
> >
> >
> > So, why has it not grouped all the AAAAAAAA's etc so that it, instead
> looks
> > like this:
> >
> > AAAAAAAA        [a0,b0,c0]
> > BBBBBBBB        [a1,b1,c1]
> > BBBBBBBB        [a1,b2,c1]
> > BBBBBBBB        [a1,b3,c1]
> > CCCCCCCC        [a2,,]
> > CCCCCCCC        [a3,,]
> > DDDDDDDD        [,,c2]
> > DDDDDDDD        [,,c3]
> >
> > ?
> >
> > ---------------------
> >
> > I have another question. Instead of these Key/Value pairs, what if I
> > have two input files list1.txt and list2.txt, both containing a list
> > of names, one line per name. I want to JOIN these input files BY the
> > names in each list. i.e. I want to create an output file containing a
> > list of the names that appear in both the input lists. Is it possible
> > to adapt the Join example packaged with Hadoop to implement this?
> >
> >
> > Many thanks,
> >
> > Rob Stewart
> >
>

Re: Join Hadoop Example problem

Reply via email to