Hi all,
I am migrating from ant builds to maven. So, brand new to Maven and do
not yet understand many parts of it.
Problem: I have a perfectly working map-reduce program (working by ant
build). This program needs an external jar file (json-rpc-1.0.jar). So, when
I run the program, I do the following to get a nice output:
$ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output/
(note that I include the external jar file by the "-libjars" option as
mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253).
Everything is fine with my ant build.
So, now, I move on to Maven. I had some trouble getting my pom.xml right. I
am still unsure if it is right, but, it builds "successfully" (the resulting
jar file has the class files of my program). The essential part of my
pom.xml has the two following dependencies (a complete pom.xml is at the end
of this email).
<!-- org.json.* -->
<dependency>
<groupId>com.metaparadigm</groupId>
<artifactId>json-rpc</artifactId>
<version>1.0</version>
</dependency>
<!-- org.apache.hadoop.* -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2</version>
<scope>provided</scope>
</dependency>
I try to run it like this:
$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
Exception in thread "main" java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$
Then, I thought, maybe it is not necessary to include the classpath. So, I
ran with the following command:
$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output
Exception in thread "main" java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$
Question: What am I doing wrong? I know, since I am new to Maven, I may be
missing some key pieces/concepts. What really happens when one builds the
classes, where my java program imports org.json.JSONArray and
org.json.JSONObject? This import is just for compilation I suppose and it
does not get "embedded" into the final jar. Am I right?
I want to either bundle-up the external jar(s) into a single jar and
conveniently run hadoop using that, or, know how to include the external
jars in my command-line.
This is what I have:
- maven 3.0.3
- Mac OSX
- Java 1.6.0_26
- Hadoop - CDH 0.20.2-cdh3u0
I have Googled, looked at Tom White's github repo (
https://github.com/cloudera/repository-example/blob/master/pom.xml). The
more I Google, the more confused I get.
Any help is highly appreciated.
Thanks,
PD.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ABC</groupId>
<artifactId>MyHadoopProgram</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>MyHadoopProgram</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<!-- org.json.* -->
<dependency>
<groupId>com.metaparadigm</groupId>
<artifactId>json-rpc</artifactId>
<version>1.0</version>
</dependency>
<!-- org.apache.hadoop.* -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>