Here's what I get. but I'm not loading any custom code:
bin/hadoop jar ~/projects/lucene/mahout/clean/examples/target/mahout-
examples-0.2-SNAPSHOT.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Preparing Input
09/07/16 13:00:35 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/07/16 13:00:36 INFO mapred.FileInputFormat: Total input paths to
process : 1
09/07/16 13:00:38 INFO mapred.JobClient: Running job:
job_200907160952_0003
09/07/16 13:00:39 INFO mapred.JobClient: map 0% reduce 0%
09/07/16 13:00:53 INFO mapred.JobClient: map 100% reduce 0%
09/07/16 13:00:55 INFO mapred.JobClient: Job complete:
job_200907160952_0003
09/07/16 13:00:55 INFO mapred.JobClient: Counters: 8
09/07/16 13:00:55 INFO mapred.JobClient: Job Counters
09/07/16 13:00:55 INFO mapred.JobClient: Launched map tasks=2
09/07/16 13:00:55 INFO mapred.JobClient: Data-local map tasks=2
09/07/16 13:00:55 INFO mapred.JobClient: FileSystemCounters
09/07/16 13:00:55 INFO mapred.JobClient: HDFS_BYTES_READ=291644
09/07/16 13:00:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=448760
09/07/16 13:00:55 INFO mapred.JobClient: Map-Reduce Framework
09/07/16 13:00:55 INFO mapred.JobClient: Map input records=600
09/07/16 13:00:55 INFO mapred.JobClient: Spilled Records=0
09/07/16 13:00:55 INFO mapred.JobClient: Map input bytes=288374
09/07/16 13:00:55 INFO mapred.JobClient: Map output records=600
Running Canopy to get initial clusters
09/07/16 13:00:55 INFO canopy.CanopyDriver: Input: output/data Out:
output/canopies Measure:
org.apache.mahout.utils.EuclideanDistanceMeasure t1: 80.0 t2: 55.0
Vector Class: SparseVector
09/07/16 13:00:55 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
09/07/16 13:00:55 INFO mapred.FileInputFormat: Total input paths to
process : 2
09/07/16 13:00:55 INFO mapred.JobClient: Running job:
job_200907160952_0004
09/07/16 13:00:56 INFO mapred.JobClient: map 0% reduce 0%
09/07/16 13:01:08 INFO mapred.JobClient: map 100% reduce 0%
09/07/16 13:09:04 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_m_000000_0, Status : FAILED
Too many fetch-failures
09/07/16 13:09:04 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:09:05 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:11:38 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
09/07/16 13:11:38 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:11:38 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:19:36 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_m_000001_0, Status : FAILED
Too many fetch-failures
09/07/16 13:19:37 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:19:37 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:22:09 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
09/07/16 13:22:09 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:22:09 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:30:06 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_m_000000_1, Status : FAILED
Too many fetch-failures
09/07/16 13:30:06 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:30:06 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:32:39 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_r_000000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
09/07/16 13:32:39 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:32:39 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:40:37 INFO mapred.JobClient: Task Id :
attempt_200907160952_0004_m_000001_1, Status : FAILED
Too many fetch-failures
09/07/16 13:40:37 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:40:37 WARN mapred.JobClient: Error reading task
outputConnection refused
09/07/16 13:43:13 INFO mapred.JobClient: Job complete:
job_200907160952_0004
09/07/16 13:43:13 INFO mapred.JobClient: Counters: 13
09/07/16 13:43:13 INFO mapred.JobClient: Job Counters
09/07/16 13:43:13 INFO mapred.JobClient: Launched reduce tasks=4
09/07/16 13:43:13 INFO mapred.JobClient: Launched map tasks=6
09/07/16 13:43:13 INFO mapred.JobClient: Data-local map tasks=6
09/07/16 13:43:13 INFO mapred.JobClient: Failed reduce tasks=1
09/07/16 13:43:13 INFO mapred.JobClient: FileSystemCounters
09/07/16 13:43:13 INFO mapred.JobClient: HDFS_BYTES_READ=448760
09/07/16 13:43:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=20880
09/07/16 13:43:13 INFO mapred.JobClient: Map-Reduce Framework
09/07/16 13:43:13 INFO mapred.JobClient: Combine output records=0
09/07/16 13:43:13 INFO mapred.JobClient: Map input records=600
09/07/16 13:43:13 INFO mapred.JobClient: Spilled Records=28
09/07/16 13:43:13 INFO mapred.JobClient: Map output bytes=20692
09/07/16 13:43:13 INFO mapred.JobClient: Map input bytes=448580
09/07/16 13:43:13 INFO mapred.JobClient: Combine input records=0
09/07/16 13:43:13 INFO mapred.JobClient: Map output records=28
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
1255)
at
org
.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:
164)
at
org
.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:
96)
at
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:
56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
39)
at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
On Jul 16, 2009, at 1:57 PM, Adil Aijaz wrote:
My basic understanding of the class loader stuff is:
1. Any jars that need to be available to map/reduce jobs should be
specified through -libjars (e.g hadoop --config ... -libjars
gson.jar jar <path to my jar> ...)
2. Any jars that need to be available to the main class should be
specified through lib/*.jar (that is in the mahout-examples-0.2-
SNAPSHOT/lib/*.jar)
unless of course as Jeff is saying one ends up flattening the lib/
*.jar into top level classes.
Adil
Jeff Eastman wrote:
Isn't this the same old problem that our Job jar file has a lib
directory with the Mahout code in it and the way Hadoop loads the
jar it sometimes cannot resolve classes in it? IIRC, one needs to
smash the job jar file into a single jar in order for Dirichlet (at
least, and any other examples which contain non-core classes). I
confess I do not understand the class loader stuff enough to be
more specific.
I have duplicated the CNF exception by defining and using a user-
defined distance measure in the Job file and running KMeans with
it, so it is not specific to Dirichlet.
classes
Grant Ingersoll wrote:
Hmm, I'm not seeing the ClassNotFound problem but am getting fetch
failures. Will look later.
-Grant
On Jul 16, 2009, at 11:32 AM, Paul Ingles wrote:
I've just tried setting a brand new machine (Ubuntu 8.04 Virtual
Machine) with Hadoop 0.20.0 and running the compile jobs against
it. I get the same problems as before... still scratching my
head :(
On 16 Jul 2009, at 12:15, Paul Ingles wrote:
Sure,
I'm running (currently) on my MacBook Air, running OSX Leopard.
JDK: java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03-211)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02-83, mixed mode)
Hadoop is: 0.20.0, r763504
I'm compiling mahout from trunk (r794023) as follows (in the
root of the project directory):
% mvn install
% hadoop jar examples/target/mahout-examples-0.2-SNAPSHOT.job
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
The only difference (for dirichlet) is the different class to run.
Thanks,
Paul
On 16 Jul 2009, at 11:33, Grant Ingersoll wrote:
Can you share how you built and how you are running, as in
command line options, etc.? Also, JDK version, Hadoop version,
etc.
On Jul 16, 2009, at 6:21 AM, Paul Ingles wrote:
Hi,
Thank you for the suggestion. Unfortunately, when I tried that
I received the same error. I've also tried copying the gson
jar directly into $HADOOP_HOME/lib (when I was running a
single node pseudo-distributed) and get the same error still.
Weirdly enough, if I try and run the Dirichlet example on the
cluster I receive another ClassNotFoundException:
09/07/16 10:27:54 INFO mapred.JobClient: Task Id :
attempt_200907161026_0002_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org
.apache
.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org
.apache
.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org
.apache
.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun
.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org
.apache
.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring
object
at
org
.apache
.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org
.apache
.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org
.apache
.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
117)
at
org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun
.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
39)
at
sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org
.apache
.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 13 more
Caused by: java.lang.RuntimeException:
java.lang.ClassNotFoundException:
org
.apache
.mahout
.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
at
org
.apache
.mahout
.clustering
.dirichlet
.DirichletMapper.getDirichletState(DirichletMapper.java:95)
at
org
.apache
.mahout
.clustering
.dirichlet.DirichletMapper.configure(DirichletMapper.java:60)
... 18 more
Caused by: java.lang.ClassNotFoundException:
org
.apache
.mahout
.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
288)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at
org
.apache
.mahout
.clustering
.dirichlet.DirichletDriver.createState(DirichletDriver.java:121)
at
org
.apache
.mahout
.clustering
.dirichlet
.DirichletMapper.getDirichletState(DirichletMapper.java:71)
... 19 more
Hoping this sparks some other suggestions :)
Thanks,
Paul
On Wed Jul 15 22:08:09 UTC 2009, Adil Aijaz <a...@yahoo-
inc.com> wrote:
try hadoop --config <hod-cluster-dir> jar -libjars <path to
gson.jar>
<your job/jar file> <your class> <arguments>
Adil
Paul Ingles wrote:
Hi,
Apologies for the cross-posting (I also sent this to the
Hadoop user
list) but I'm still getting errors if I try and run the KMeans
examples on a cluster, whether that be my single-node Mac
Pro, or our
cluster. I've attached the stack trace at the bottom of the
email.
The gson jar is definitely included in the packaged .job,
and is also
in the temporary directory when the task tracker picks up
the work.
The gson jar also includes TypeToken.class in the expected
path.
Again, really appreciate people's help in getting this going!
----snip----
09/07/15 17:06:38 INFO mapred.JobClient: Task Id :
attempt_200907151617_0010_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/
TypeToken
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:703)
at
java
.security
.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:
260)
at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
402)
at
org
.apache
.mahout
.matrix.AbstractVector.asFormatString(AbstractVector.java:374)
at
org
.apache
.mahout
.clustering
.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)
at
org
.apache
.mahout
.clustering
.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
at
org
.apache
.mahout
.clustering
.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
356)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
com.google.gson.reflect.TypeToken
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:
402)
... 20 more
----snip----
Incidentally, as part of this work I've also implemented a
Pearson
distance measure, if people think it would be useful to be
folded in
I'd be happy to get the SVN patch with tests and
implementation together.
Thanks,
Paul
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/
Droids) using Solr/Lucene:
http://www.lucidimagination.com/search
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search