Hello, I'm trying to trigger a Mahout job from inside my Java application (running in Eclipse), and get it running on my cluster. I have a main class that simply contains:
String[] args = new String[] { "--input", "/input/triples.csv", "--output", "/output/vectors.txt", "--similarityClassname", VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(), "--numRecommendations", "10000", "--tempDir", "temp/" + System.currentTimeMillis() }; Configuration conf = new Configuration(); ToolRunner.run(conf, new RecommenderJob(), args); If I package the whole project up in a single jar (using Maven), copy it to the namenode, and run it with "hadoop jar project.jar" it works fine. But if I try and run it from my dev pc in Eclipse (where all the same dependencies are still in the classpath), and add the 3 hadoop xml files to the classpath, it triggers hadoop jobs, but they fail with errors like: 12/07/26 14:42:09 INFO mapred.JobClient: Task Id : attempt_201206261211_0173_m_000001_0, Status : FAILED Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ... What I'm trying to create is a self-contained JAR that can be run from the command-line and launch the mahout job on the cluster. I've got this all working with embedded pig scripts, but I can't get it working here. Any help is appreciated, or advice on better ways to trigger the jobs from code. Thanks