Hi,

The latest: I've updated to Subversion revision 793894 for trunk, the code 
compiles and runs all of its tests successfully (mvn install inside the project 
root/checkout dir).

If I then run the kmeans example:

$ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It finishes the Iteration 0 but then errors with the following:

09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: Cannot open 
filename /user/pair/output/clusters-0/_logs
java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
        at 
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194)
        at 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100)
        at 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 

It then moves onto the Clustering phase and reports the following:

09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering 
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data Clusters In: 
output/clusters-0 Out: output/points Distance: 
org.apache.mahout.utils.EuclideanDistanceMeasure
09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: 
org.apache.mahout.matrix.SparseVector
09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to process : 
271
09/07/14 14:42:16 INFO mapred.JobClient: Running job: job_200907141434_0004
09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
09/07/14 14:42:28 INFO mapred.JobClient: Task Id : 
attempt_200907141434_0004_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
        at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
        at 
org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374)
        at 
org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198)
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39)
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.google.gson.reflect.TypeToken
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
        ... 20 more

Again, not sure why it's not able to load the gson jar file, it's definitely in 
the dependencies folder and is included in the built mahout-*.job inside the 
lib folder.



On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <[email protected]> wrote:
> I'm not sure I'm afraid, they were whilst I was building at home.
> 
> I've just updated trunk here and the current revision (793894) builds  
> successfully. I'm going to switch the cluster over to 0.20.0 and see  
> whether I can get the KMeans example to run without the GSon problem I  
> was having before.
> 
> Thanks again,
> Paul
> 
> 
> On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:
> 
> >
> > On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:
> >
> >> Hi,
> >>
> >> I've been going over the kmeans stuff the last few days to try and  
> >> understand how it works, and how I might extend it to work with the  
> >> data I'm looking to process. It's taken me a while to get a basic  
> >> understanding of things, and really appreciate having lists like  
> >> this around for support.
> >>
> >> I need to be able to label the vectors: each vector holds (for a  
> >> document) a set of similarity scores across a number of attributes.  
> >> I did some searching around payloads (after coming across the term  
> >> in some comments) but couldn't see how I add a payload to the  
> >> Vector. I then stumbled on MAHOUT-65 
> >> (https://issues.apache.org/jira/browse/MAHOUT-65 
> >> ) that mentions the addition of the setName method to Vector. I've  
> >> tried building trunk, and although there were a few test failures  
> >> for other (seemingly unrelated) examples I continued and managed to  
> >> get the mahout-examples jar/job files built to give it a whirl.
> >
> > What were the errors?

Reply via email to