r793974 adds another validity test to the isConverged() valid file filter. This will skip over any _log files that mysteriously get added to the clusters directories. Now, only files beginning with "part" and not ending with ".crc" will be processed.


Jeff Eastman wrote:
Why are log files being written to the clusters directories? That is not happening in my trunk checkout and putting any other files into the clusters directories will break the isConverged() method and probably also the mapper & reducer configure() methods.


Grant Ingersoll wrote:
Are you running in standalone, pseudo-distributed or fully distributed mode in Hadoop?

It looks like a permission error in Hadoop, but maybe we need to make sure we have appropriate access. I'm not that familiar with the Hadoop permission capabilities.

On Jul 14, 2009, at 10:46 AM, Paul Ingles wrote:

I'm definitely scratching my head now, although I think it's most likely some kind of dodgy configuration/setup on the cluster I'm using- if I run some of the other examples I get class loading errors for the example classes!

I downloaded a fresh and unconfigured release of Hadoop 0.20 and a new checkout of Mahout trunk, and it compiled, tested, and ran through the kmeans example without trouble.

If I find out what causes the problem I'll let the list know.

Thanks,
Paul

On 14 Jul 2009, at 15:01, Paul Ingles wrote:

Hi,

The latest: I've updated to Subversion revision 793894 for trunk, the code compiles and runs all of its tests successfully (mvn install inside the project root/checkout dir).

If I then run the kmeans example:

$ hadoop jar ./examples/target/mahout-examples-0.2-SNAPSHOT.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It finishes the Iteration 0 but then errors with the following:

09/07/14 14:42:16 INFO mapred.JobClient:     Reduce input records=449
09/07/14 14:42:16 WARN kmeans.KMeansDriver: java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs java.io.IOException: Cannot open filename /user/pair/output/clusters-0/_logs at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1444) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1435)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:347)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) at org.apache.mahout.clustering.kmeans.KMeansDriver.isConverged(KMeansDriver.java:304) at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:241) at org.apache.mahout.clustering.kmeans.KMeansDriver.runJob(KMeansDriver.java:194) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:100) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering

It then moves onto the Clustering phase and reports the following:

09/07/14 14:42:16 INFO kmeans.KMeansDriver: Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Running Clustering
09/07/14 14:42:16 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output/points Distance: org.apache.mahout.utils.EuclideanDistanceMeasure 09/07/14 14:42:16 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.matrix.SparseVector 09/07/14 14:42:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 09/07/14 14:42:16 INFO mapred.FileInputFormat: Total input paths to process : 271 09/07/14 14:42:16 INFO mapred.JobClient: Running job: job_200907141434_0004
09/07/14 14:42:17 INFO mapred.JobClient:  map 0% reduce 0%
09/07/14 14:42:28 INFO mapred.JobClient: Task Id : attempt_200907141434_0004_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: com/google/gson/reflect/TypeToken
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:675)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
at org.apache.mahout.matrix.AbstractVector.asFormatString(AbstractVector.java:374) at org.apache.mahout.clustering.kmeans.Cluster.outputPointWithClusterInfo(Cluster.java:198) at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:39) at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.map(KMeansClusterMapper.java:32)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: com.google.gson.reflect.TypeToken
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:316)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:288)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
    ... 20 more

Again, not sure why it's not able to load the gson jar file, it's definitely in the dependencies folder and is included in the built mahout-*.job inside the lib folder.



On Tue Jul 14 13:31:53 UTC 2009, Paul Ingles <[email protected]> wrote:
I'm not sure I'm afraid, they were whilst I was building at home.

I've just updated trunk here and the current revision (793894) builds
successfully. I'm going to switch the cluster over to 0.20.0 and see
whether I can get the KMeans example to run without the GSon problem I
was having before.

Thanks again,
Paul


On 14 Jul 2009, at 14:04, Grant Ingersoll wrote:


On Jul 13, 2009, at 7:02 PM, Paul Ingles wrote:

Hi,

I've been going over the kmeans stuff the last few days to try and
understand how it works, and how I might extend it to work with the
data I'm looking to process. It's taken me a while to get a basic
understanding of things, and really appreciate having lists like
this around for support.

I need to be able to label the vectors: each vector holds (for a
document) a set of similarity scores across a number of attributes.
I did some searching around payloads (after coming across the term
in some comments) but couldn't see how I add a payload to the
Vector. I then stumbled on MAHOUT-65 (https://issues.apache.org/jira/browse/MAHOUT-65
) that mentions the addition of the setName method to Vector. I've
tried building trunk, and although there were a few test failures
for other (seemingly unrelated) examples I continued and managed to
get the mahout-examples jar/job files built to give it a whirl.

What were the errors?


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search







Reply via email to