I am still trying to make this work. I am running AEMR with the latest
mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job
--main-class
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
s3n://myBucket/mahout-input/synthetic-control.data --arg
s3n://myBucket/mahout-output/dirichlet --arg
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
--arg 10 --arg 5 --arg 1.0 --arg 1
This gave me the class not found error mentioned in my previous email.
I have tried the following: I moved the DirichletJob class from the
core
project into the exampes project, putting it in
org.apache.mahout.clustering.syntheticcontrol.dirichlet. The
rationale for
doing that is that in this way, the classloader does not need to
look into
lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class;
instead it
finds it directly alongside Job.class.
This got me one step further, but an error of the same type stops me
again:
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at
org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
at
org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
... 8 more
This happens on a .loadClass() from the current thread's classloader.
I have tried running this example on my local single-node Hadoop
installation: this runs fine. The error above occurs only with Amazon
Elastic MapReduce, and definitely seems related to classloading
issues.
Any ideas ?
Thanks
Sebastien
2009/5/15 Sebastien Bratieres <[email protected]>
Hi,
Thanks Grant, that did it. I'll figure out later what's going on.
Now I'm able to run the kMeans example on Amazon EMR as Stephen
did. I
want
to run the Dirichlet example, which I launch with
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the
main
class from the mahout-examples-0.2-SNAPSHOT.job.
This fails with
java.lang.NoClassDefFoundError:
org/apache/mahout/clustering/dirichlet/DirichletJob
at
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
at
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
DirichletJob is located in the .job file, inside
lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader
can't
find
it.
One difference between kMeans and Dirichlet is
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
JobConf conf = new JobConf(Job.class);
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
JobConf conf = new JobConf(DirichletJob.class);
ie the Dirichlet version uses a job class which is in core, while the
kMeans version uses the currently executing Job class from
examples. Is
there an issue with this ?
What should I do to work around this error ? Is the MANIFEST.MF
file of
the
.job contain a pointer to the /lib directory for the jars there to be
visible by the jar classloader ?
Thanks
Sebastien
2009/5/14 Grant Ingersoll <[email protected]>
Try running mvn install from the top level dir first.
On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
Hi,
I'd like to walk in the footsteps of Stephen Green running
Mahout on
EMR.
He points out that the fix to issue 118 is needed to do that (I
first
ran into the file system error too). I'm a first-time Maven user
and I
don't know how to rebuild the mahout-examples-1.0.job file once
I have
retrieved revision 765769 from SVN (I use Eclipse). I have tried
- highlight mahout-examples project
- right-click Run As / Maven package (though I'm not sure at all
that
Maven package is the right option to use!)
but that gives me this error
---
[INFO] Scanning for projects...
[INFO]
------------------------------------------------------------------------
[INFO] Building Mahout examples
[INFO]
[INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
[INFO] task-segment: [package]
[INFO]
------------------------------------------------------------------------
[INFO] [resources:resources]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 0 resource
[INFO] [resources:copy-resources]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO] [compiler:compile]
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[ERROR]
Transitive dependency resolution for scope: test has failed for
your
project.
Error message: Missing:
----------
1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
Try downloading the file manually from the project website.
Then, install it using the command:
mvn install:install-file -DgroupId=org.apache.mahout
-DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
-Dpackaging=test-jar -Dfile=/path/to/file
Alternatively, if you host your own repository you can deploy
the file
there:
mvn deploy:deploy-file -DgroupId=org.apache.mahout
-DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
-Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
-DrepositoryId=[id]
Path to dependency:
1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
----------
1 required artifact is missing.
for artifact:
org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
from the specified remote repositories:
Apache snapshots
(http://people.apache.org/maven-snapshot-repository),
maven2-repository.dev.java.net (http://download.java.net/maven/2),
central (http://repo1.maven.org/maven2)
Group-Id: org.apache.mahout
Artifact-Id: mahout-examples
Version: 0.2-SNAPSHOT
From file: C:\workspace\mahout\examples\pom.xml
[INFO]
------------------------------------------------------------------------
[INFO] For more information, run with the -e flag
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILED
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 6 seconds
[INFO] Finished at: Thu May 14 16:58:46 CEST 2009
[INFO] Final Memory: 3M/22M
[INFO]
------------------------------------------------------------------------
---
So again, my goal is to have a new mahout-examples-1.0.job file or
equivalent that contains the patch for 118 and will run on EMR.
What
is the right way to do this ?
Thanks
Sebastien
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem
(Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search