What is the exact command that you are giving when submitting the jobs? I did not see it in your e-mail.
Abhishek On Mon, Jan 25, 2010 at 5:43 PM, Rob Stewart <[email protected]> wrote: > Hi there, I'm using Hadoop 0.20.1 and I'm trying to use the Join application > within the hadoop-*examples.jar . I can't seem to figure it out, where am I > going wrong? It isn't grouping the keys together, as I would expect.... > ------------------------ >> bin/hadoop dfs -cat join/a.txt > AAAAAAAA,a0 > BBBBBBBB,a1 > CCCCCCCC,a2 > CCCCCCCC,a3 > >> bin/hadoop dfs -cat join/b.txt > AAAAAAAA,b0 > BBBBBBBB,b1 > BBBBBBBB,b2 > BBBBBBBB,b3 > >> bin/hadoop dfs -cat join/c.txt > AAAAAAAA,c0 > BBBBBBBB,c1 > DDDDDDDD,c2 > DDDDDDDD,c3 > >> > > -----*RESULT*----- >>bin/hadoop dfs -text theOutputs/part-00000 > AAAAAAAA [a0] > AAAAAAAA [b0] > AAAAAAAA [c0] > BBBBBBBB [c1] > BBBBBBBB [a1] > BBBBBBBB [b1] > BBBBBBBB [b2] > BBBBBBBB [b3] > CCCCCCCC [a2] > CCCCCCCC [a3] > DDDDDDDD [c2] > DDDDDDDD [c3] > ----------------------- > > > So, why has it not grouped all the AAAAAAAA's etc so that it, instead looks > like this: > > AAAAAAAA [a0,b0,c0] > BBBBBBBB [a1,b1,c1] > BBBBBBBB [a1,b2,c1] > BBBBBBBB [a1,b3,c1] > CCCCCCCC [a2,,] > CCCCCCCC [a3,,] > DDDDDDDD [,,c2] > DDDDDDDD [,,c3] > > ? > > --------------------- > > I have another question. Instead of these Key/Value pairs, what if I > have two input files list1.txt and list2.txt, both containing a list > of names, one line per name. I want to JOIN these input files BY the > names in each list. i.e. I want to create an output file containing a > list of the names that appear in both the input lists. Is it possible > to adapt the Join example packaged with Hadoop to implement this? > > > Many thanks, > > Rob Stewart >
