Right, that was the first option I tried and it fails there as well. Maybe I need to step back and ask a higher-level question - does anyone have a full, step-by-step example of using a reduce-side join in an M/R job? Preferrably using the contrib/DataJoin classes, but I'll be happy with whatever example I could get.
I'd love to see the actual code and then how it's kicked off on the command line so I can try it on my end as a prototype. I must be doing something wrong, but don't know what it is. Thanks. On Mon, Mar 29, 2010 at 8:31 AM, Jones, Nick <nick.jo...@amd.com> wrote: > M B, > I'm not sure about the -libjars argument but 'hadoop jar' is expecting the > jarfile immediately afterwards: hadoop jar jarFile [mainClass] args... > > Nick Jones > > -----Original Message----- > From: M B [mailto:machac...@gmail.com] > Sent: Monday, March 29, 2010 10:26 AM > To: common-user@hadoop.apache.org > Subject: Re: ClassNotFoundException with contrib/join example > > Sorry, I should have mentioned that I tried that as well and it also gives > an error: > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input > datajoin/output Text 1 > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > Exception in thread "main" java.io.IOException: Error opening job jar: > -libjars > at org.apache.hadoop.util.RunJar.main(RunJar.java:90) > Caused by: java.util.zip.ZipException: error in opening zip file > at java.util.zip.ZipFile.open(Native Method) > at java.util.zip.ZipFile.<init>(ZipFile.java:114) > at java.util.jar.JarFile.<init>(JarFile.java:133) > at java.util.jar.JarFile.<init>(JarFile.java:70) > at org.apache.hadoop.util.RunJar.main(RunJar.java:88) > Has something changed or is my environment not set up correctly? > Appreciate > any help. > > > > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Then use the syntax given by > > > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html > > : > > > > $ bin/hadoop jar -libjars ./samplejoin.jar > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ... > > > > On Fri, Mar 26, 2010 at 5:10 PM, M B <machac...@gmail.com> wrote: > > > > > Sorry, but where exactly do I include the libjars option? I tried to > put > > > it > > > where you stated (after the DataJoinJob class), but it just comes back > > with > > > usage information (as if the option is not valid): > > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars > > ./samplejoin.jar > > > datajoin/input datajoin/output Text 1 > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format > numofParts > > > mapper_class reducer_class map_output_value_class output_value_class > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]* > > > > > > It seems like it's not taking the option for some reason, like it's > > failing > > > an argument check in DataJoinJob - does that not use the standard args > or > > > something? > > > > > > > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in > your > > > > HADOOP_CLASSPATH > > > > > > > > I think you should specify samplejoin.jar using -libjars instead of > > > putting > > > > it directly after jar command: > > > > hadoop jar hadoop-0.20.2-datajoin.jar > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars > > > ./samplejoin.jar > > > > ... (same as your example) > > > > > > > > Cheers > > > > > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <machac...@gmail.com> wrote: > > > > > > > > > I may be having a setup issue with classpaths, would appreciate > some > > > > help. > > > > > > > > > > I created a jar with all the Sample* classes in contrib/DataJoin. > > Here > > > > is > > > > > the listing of my samplejoin.jar file: > > > > > " zip.vim version v22 > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar > > > > > " Select a file with cursor and press ENTER > > > > > META-INF/ > > > > > META-INF/MANIFEST.MF > > > > > org/ > > > > > org/apache/ > > > > > org/apache/hadoop/ > > > > > org/apache/hadoop/contrib/ > > > > > org/apache/hadoop/contrib/utils/ > > > > > org/apache/hadoop/contrib/utils/join/ > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class > > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class > > > > > > > > > > When I go to run this, things start to run, but every Map try > errors > > > out > > > > > with: > > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException: > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput" > > > > > > > > > > Here is the command: > > > > > hadoop jar ./samplejoin.jar > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob > > > > > datajoin/input datajoin/output Text 1 > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > > > > > > > > This is a new install of 0.20.2. > > > > > > > > > > HADOOP_CLASSPATH is set > > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > > > Any help would be appreciated. > > > > > > > > > > > > > > > >