Good point, I missed that. It is: > bin/hadoop jar hadoop-*-examples.jar join -D key.value.separator.in.input.line=',' -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat -outKey org.apache.hadoop.io.Text join/ theOutputs
Rob 2010/1/26 abhishek sharma <[email protected]> > What is the exact command that you are giving when submitting the > jobs? I did not see it in your e-mail. > > Abhishek > > On Mon, Jan 25, 2010 at 5:43 PM, Rob Stewart > <[email protected]> wrote: > > Hi there, I'm using Hadoop 0.20.1 and I'm trying to use the Join > application > > within the hadoop-*examples.jar . I can't seem to figure it out, where am > I > > going wrong? It isn't grouping the keys together, as I would expect.... > > ------------------------ > >> bin/hadoop dfs -cat join/a.txt > > AAAAAAAA,a0 > > BBBBBBBB,a1 > > CCCCCCCC,a2 > > CCCCCCCC,a3 > > > >> bin/hadoop dfs -cat join/b.txt > > AAAAAAAA,b0 > > BBBBBBBB,b1 > > BBBBBBBB,b2 > > BBBBBBBB,b3 > > > >> bin/hadoop dfs -cat join/c.txt > > AAAAAAAA,c0 > > BBBBBBBB,c1 > > DDDDDDDD,c2 > > DDDDDDDD,c3 > > > >> > > > > -----*RESULT*----- > >>bin/hadoop dfs -text theOutputs/part-00000 > > AAAAAAAA [a0] > > AAAAAAAA [b0] > > AAAAAAAA [c0] > > BBBBBBBB [c1] > > BBBBBBBB [a1] > > BBBBBBBB [b1] > > BBBBBBBB [b2] > > BBBBBBBB [b3] > > CCCCCCCC [a2] > > CCCCCCCC [a3] > > DDDDDDDD [c2] > > DDDDDDDD [c3] > > ----------------------- > > > > > > So, why has it not grouped all the AAAAAAAA's etc so that it, instead > looks > > like this: > > > > AAAAAAAA [a0,b0,c0] > > BBBBBBBB [a1,b1,c1] > > BBBBBBBB [a1,b2,c1] > > BBBBBBBB [a1,b3,c1] > > CCCCCCCC [a2,,] > > CCCCCCCC [a3,,] > > DDDDDDDD [,,c2] > > DDDDDDDD [,,c3] > > > > ? > > > > --------------------- > > > > I have another question. Instead of these Key/Value pairs, what if I > > have two input files list1.txt and list2.txt, both containing a list > > of names, one line per name. I want to JOIN these input files BY the > > names in each list. i.e. I want to create an output file containing a > > list of the names that appear in both the input lists. Is it possible > > to adapt the Join example packaged with Hadoop to implement this? > > > > > > Many thanks, > > > > Rob Stewart > > >
