Re: Error with KMeans example in trunk (793894)

Jeff Eastman Tue, 14 Jul 2009 10:00:36 -0700

r793974 adds another validity test to the isConverged() valid filefilter. This will skip over any _log files that mysteriously get addedto the clusters directories. Now, only files beginning with "part" andnot ending with ".crc" will be processed.



Jeff Eastman wrote:

Why are log files being written to the clusters directories? That isnot happening in my trunk checkout and putting any other files intothe clusters directories will break the isConverged() method andprobably also the mapper & reducer configure() methods.
Grant Ingersoll wrote:
Are you running in standalone, pseudo-distributed or fullydistributed mode in Hadoop?
It looks like a permission error in Hadoop, but maybe we need to makesure we have appropriate access. I'm not that familiar with theHadoop permission capabilities.
On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:
I'm definitely scratching my head now, although I think it's mostlikely some kind of dodgy configuration/setup on the cluster I'musing- if I run some of the other examples I get class loadingerrors for the example classes!
I downloaded a fresh and unconfigured release of Hadoop 0.20 and anew checkout of Mahout trunk, and it compiled, tested, and ranthrough the kmeans example without trouble.
If I find out what causes the problem I'll let the list know.

Thanks,
Paul

On 14 Jul 2009, at 15:01, Paul Ingles wrote:
Hi,
The latest: I've updated to Subversion revision 793894 for trunk,the code compiles and runs all of its tests successfully (mvninstall inside the project root/checkout dir).
If I then run the kmeans example:
$ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.joborg.apache.mahout.clustering.syntheticcontrol.kmeans.Job
It finishes the Iteration 0 but then errors with the following:

09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException:Cannot open filename /user/pair/output/clusters-0/_logsjava.io.IOException: Cannot open filename/user/pair/output/clusters-0/_logsatorg.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)atorg.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
atorg.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)atorg.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)atorg.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)atorg.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)atorg.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)atorg.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)atorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)atorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering

It then moves onto the Clustering phase and reports the following:

09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/dataClusters In: output/clusters-0 Out: output/points Distance:org.apache.mahout.utils.EuclideanDistanceMeasure09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 InputVectors: org.apache.mahout.matrix.SparseVector09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParserfor parsing the arguments. Applications should implement Tool forthe same.09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths toprocess : 27109/07/14 14:42:16 INFO mapred.JobClient: Running job:job_200907141434_0004
09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
09/07/14 14:42:28 INFO mapred.JobClient: Task Id :attempt_200907141434_0004_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
atjava.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
atorg.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)atorg.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)atorg.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)atorg.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:com.google.gson.reflect.TypeToken
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
    ... 20 more
Again, not sure why it's not able to load the gson jar file, it'sdefinitely in the dependencies folder and is included in the builtmahout-*.job inside the lib folder.
On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <[email protected]>wrote:
I'm not sure I'm afraid, they were whilst I was building at home.

I've just updated trunk here and the current revision (793894) builds
successfully. I'm going to switch the cluster over to 0.20.0 and see
whether I can get the KMeans example to run without the GSonproblem I
was having before.

Thanks again,
Paul


On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
Hi,

I've been going over the kmeans stuff the last few days to try and
understand how it works, and how I might extend it to work with the
data I'm looking to process. It's taken me a while to get a basic
understanding of things, and really appreciate having lists like
this around for support.

I need to be able to label the vectors: each vector holds (for a
document) a set of similarity scores across a number of attributes.
I did some searching around payloads (after coming across the term
in some comments) but couldn't see how I add a payload to the
Vector. I then stumbled on MAHOUT-65(https://issues.apache.org/jira/browse/MAHOUT-65
) that mentions the addition of the setName method to Vector. I've
tried building trunk, and although there were a few test failures
for other (seemingly unrelated) examples I continued and managed to
get the mahout-examples jar/job files built to give it a whirl.
What were the errors?
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Error with KMeans example in trunk (793894)

Reply via email to