What do you get when you run on good ol' Hadoop, i.e the one we actually support and build and test on?
On Jun 10, 2011, at 7:38 PM, Jeff Eastman wrote: > Moving to @dev > > Hi Drew, > > Don't know what is happening, but I did a clean unpack of the 0.5 distro, mvn > install and ran build-reuters.sh. It downloaded the data but failed exactly > as before. Both continue to run just fine on my trunk build since I updated > yesterday. IIRC, they were both failing with trunk before 0.5 too. > > On MapR: > [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh > Please select a number to choose the corresponding clustering algorithm > 1. kmeans clustering > 2. lda clustering > Enter your choice : 1 > ok. You chose 1 and we'll use kmeans Clustering > Downloading Reuters-21578 > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 7959k 100 7959k 0 0 1769k 0 0:00:04 0:00:04 --:--:-- 1788k > Extracting... > Running on hadoop, using HADOOP_HOME=/opt/mapr/hadoop/hadoop-0.20.2 > HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-0.20.2/conf.new > 11/06/10 16:12:19 WARN driver.MahoutDriver: No > org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, > will use command-line arguments only > Deleting all files in mahout-work/reuters-out-tmp > 11/06/10 16:12:24 INFO driver.MahoutDriver: Program took 4085 ms > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > Jun 10, 2011 4:12:25 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, > --endPhase=2147483647, > --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, > --input=mahout-work/reuters-out, --keyPrefix=, > --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp} > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > maprfs > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory. > put: File mahout-work/reuters-out-seqdir does not exist. > > And then, after changing HADOOP_HOME & HADOOP_CONF_DIR to CDH3 on a fresh > untar/install of 0.5: > [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh > Please select a number to choose the corresponding clustering algorithm > 1. kmeans clustering > 2. lda clustering > Enter your choice : 1 > ok. You chose 1 and we'll use kmeans Clustering > Downloading Reuters-21578 > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 7959k 100 7959k 0 0 1707k 0 0:00:04 0:00:04 --:--:-- 1768k > Extracting... > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop > HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop1.conf > 11/06/10 16:29:42 WARN driver.MahoutDriver: No > org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, > will use command-line arguments only > Deleting all files in mahout-work/reuters-out-tmp > 11/06/10 16:29:45 INFO driver.MahoutDriver: Program took 3669 ms > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > Jun 10, 2011 4:30:02 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, > --endPhase=2147483647, > --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, > --input=mahout-work/reuters-out, --keyPrefix=, > --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp} > Exception in thread "main" java.io.IOException: Call to > hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: > java.io.EOFException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > at org.apache.hadoop.ipc.Client.call(Client.java:743) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy0.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) > rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory. > put: File mahout-work/reuters-out-seqdir does not exist. > > I do notice that, after each of these runs on a pristine untar/install, I get > a slightly different initial output but the same exception: > [dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh > Please select a number to choose the corresponding clustering algorithm > 1. kmeans clustering > 2. lda clustering > Enter your choice : 1 > ok. You chose 1 and we'll use kmeans Clustering > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > Jun 10, 2011 4:33:07 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, > --endPhase=2147483647, > --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, > --input=mahout-work/reuters-out, --keyPrefix=, > --output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp} > Exception in thread "main" java.io.IOException: Call to > hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: > java.io.EOFException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) > at org.apache.hadoop.ipc.Client.call(Client.java:743) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy0.getProtocolVersion(Unknown Source) > > There is no $MAHOUT_LOCAL in my environment but I notice the script does set > this internally. Something must be different in trunk but I cannot find it. > > -----Original Message----- > From: Drew Farris [mailto:[email protected]] > Sent: Friday, June 10, 2011 2:57 PM > To: [email protected] > Subject: Re: Problems running examples > > Hmm, I've been able to download the 0.5 src release and run it in > clustered mode. In most cases it completes fine. I ran into problems > once when I had left a mahout-work directory lying around from a > partially completed (aborted) run. I wonder if that could have > something to do with the failures you are seeing too Jeff? > > The binary release of 0.5 is most definitely broken, but that breakage > was discussed in another thread and is due to classpath issues in > bin/mahout vs. where things are placed in the binary release. > > On Fri, Jun 10, 2011 at 12:34 PM, Jeff Eastman <[email protected]> wrote: >> I'm still trying to figure out why reuters-0.5 does not work on either of my >> clusters. The scripts themselves have no diff and the environment variables >> are set as in trunk except for MAHOUT_HOME. The synthetic control and 20 >> newsgroups examples run on both clusters without problems (well, 20 >> newsgroups has a Version Mismatch error on CDH3, but that is another story). >> But when I run reuters on 0.5 I see "MAHOUT_LOCAL is set, running locally" >> followed by file IO exceptions in MahoutDriver that are cluster dependent. >> When I run it on trunk, I don't see this and it works just fine. >> >> -----Original Message----- >> From: Drew Farris [mailto:[email protected]] >> Sent: Thursday, June 09, 2011 5:36 PM >> To: [email protected] >> Subject: Re: Problems running examples >> >> Jeff, No impuning perceived and thanks for running the variety of >> tests. So it appears that trunk is fine and 0.5 isn't. I'll try to >> determine what (or what didn't) make it into 0.5 that causes it's >> brokenness. >> >> Mark, in the mean time, no need to run all of the tests I've asked >> about previously. Just give trunk a try and see if that resolves your >> problem. >> >> On Thu, Jun 9, 2011 at 7:21 PM, Jeff Eastman <[email protected]> wrote: >>> Hi Drew, >>> >>> Running trunk locally, latest update, just now, build-reuters.sh works >>> (kmeans and lda). >>> >>> Running trunk on my CDH3 cluster, just now: >>> - build-cluster-syntheticcontrol.sh works (with kmeans and others) >>> - build-reuters.sh works (with kmeans and lda) Running trunk on my CDH3 >>> cluster: >>> >>> Running trunk on my MapR cluster, just now: >>> - build-cluster-syntheticcontrol.sh works (with kmeans and others) >>> - build-reuters.sh works (with kmeans and lda) >>> >>> >>> Running the 5/31 mahout-distribution-0.5, just now: >>> - build-cluster-syntheticcontrol.sh works (CDH3 & MapR with kmeans and >>> others) >>> - build-reuters.sh runs in local mode only (CDH3 & MapR runs give different >>> errors) >>> >>> I was primarily defending kmeans. It is possible my 5/31 0.5 distribution >>> is not the final one, since everything seems kosher in trunk now. My >>> apology if I've impuned your patch. >>> >>> Jeff >>> >>> >>> -----Original Message----- >>> From: Drew Farris [mailto:[email protected]] >>> Sent: Thursday, June 09, 2011 11:36 AM >>> To: [email protected] >>> Subject: Re: Problems running examples >>> >>> Jeff, >>> >>> Could you tell me about what's failing in KMeans and LDA when running >>> on a cluster? I had this working just prior to 0.5 in >>> https://issues.apache.org/jira/browse/MAHOUT-694 >>> >>> Thanks, >>> >>> Drew >>> >>> On Thu, Jun 9, 2011 at 2:01 PM, Jeff Eastman <[email protected]> wrote: >>>> Ahem, KMeans is not busted. It is being maintained by me, at least. The >>>> build-reuters.sh script runs only in local mode on 0.5 and fails in both >>>> KMeans and LDA when run on a cluster. The MIA examples are not always >>>> correct. Most of this has been reported before. >>>> >>>> -----Original Message----- >>>> From: Sean Owen [mailto:[email protected]] >>>> Sent: Thursday, June 09, 2011 12:29 AM >>>> To: [email protected] >>>> Subject: Re: Problems running examples >>>> >>>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up >>>> before. I don't know if it is being maintained. Anyone who's willing to >>>> step up and fix it is also welcome to overhaul it IMHO. >>>> >>>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee <[email protected]> wrote: >>>> >>>>> I got a slightly different error on the next line of KMeansDriver.java >>>>> (running on OS X Snow Leopard) >>>>> >>>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor >>>>> Exception in thread "main" java.lang.ClassCastException: >>>>> org.apache.hadoop.io.IntWritable cannot be cast to >>>>> org.apache.mahout.math.VectorWritable >>>>> at >>>>> >>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90) >>>>> at >>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102) >>>>> >>>>> >>>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <[email protected]> wrote: >>>>> >>>>>> IIRC, Reuters used to run on a cluster but no longer does due to some >>>>>> obscure Lucene changes. In 0.5 it only works in local mode. I really hope >>>>>> this can be repaired by 0.6 as Reuters is a key entry point into Mahout >>>>>> clustering for many users. >>>>>> >>>>> >>>> >>> >>
