Moving to @dev

Hi Drew,

Don't know what is happening, but I did a clean unpack of the 0.5 distro, mvn 
install and ran build-reuters.sh. It downloaded the data but failed exactly as 
before. Both continue to run just fine on my trunk build since I updated 
yesterday. IIRC, they were both failing with trunk before 0.5 too.

On MapR:
[dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  1769k      0  0:00:04  0:00:04 --:--:-- 1788k
Extracting...
Running on hadoop, using HADOOP_HOME=/opt/mapr/hadoop/hadoop-0.20.2
HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-0.20.2/conf.new
11/06/10 16:12:19 WARN driver.MahoutDriver: No 
org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will 
use command-line arguments only
Deleting all files in mahout-work/reuters-out-tmp
11/06/10 16:12:24 INFO driver.MahoutDriver: Program took 4085 ms
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 10, 2011 4:12:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
--endPhase=2147483647, 
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
--input=mahout-work/reuters-out, --keyPrefix=, 
--output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.io.IOException: No FileSystem for scheme: maprfs
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory.
put: File mahout-work/reuters-out-seqdir does not exist.

And then, after changing HADOOP_HOME & HADOOP_CONF_DIR to CDH3 on a fresh 
untar/install of 0.5:
[dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  1707k      0  0:00:04  0:00:04 --:--:-- 1768k
Extracting...
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/hadoop1.conf
11/06/10 16:29:42 WARN driver.MahoutDriver: No 
org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will 
use command-line arguments only
Deleting all files in mahout-work/reuters-out-tmp
11/06/10 16:29:45 INFO driver.MahoutDriver: Program took 3669 ms
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 10, 2011 4:30:02 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
--endPhase=2147483647, 
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
--input=mahout-work/reuters-out, --keyPrefix=, 
--output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.io.IOException: Call to 
hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: 
java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
        at org.apache.hadoop.ipc.Client.call(Client.java:743)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at 
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:62)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
rmr: cannot remove mahout-work/reuters-out-seqdir: No such file or directory.
put: File mahout-work/reuters-out-seqdir does not exist.

I do notice that, after each of these runs on a pristine untar/install, I get a 
slightly different initial output but the same exception:
[dev@devbox mahout-distribution-0.5]$ ./examples/bin/build-reuters.sh
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
MAHOUT_LOCAL is set, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/mahout-examples-0.5-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/dev/Desktop/mahout-distribution-0.5/examples/target/dependency/slf4j-jcl-1.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 10, 2011 4:33:07 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
--endPhase=2147483647, 
--fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
--input=mahout-work/reuters-out, --keyPrefix=, 
--output=mahout-work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.io.IOException: Call to 
hadoop1.eng.narus.com/172.31.2.200:8020 failed on local exception: 
java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
        at org.apache.hadoop.ipc.Client.call(Client.java:743)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy0.getProtocolVersion(Unknown Source)

There is no $MAHOUT_LOCAL in my environment but I notice the script does set 
this internally. Something must be different in trunk but I cannot find it.

-----Original Message-----
From: Drew Farris [mailto:[email protected]]
Sent: Friday, June 10, 2011 2:57 PM
To: [email protected]
Subject: Re: Problems running examples

Hmm, I've been able to download the 0.5 src release and run it in
clustered mode. In most cases it completes fine. I ran into problems
once when I had left a mahout-work directory lying around from a
partially completed (aborted) run. I wonder if that could have
something to do with the failures you are seeing too Jeff?

The binary release of 0.5 is most definitely broken, but that breakage
was discussed in another thread and is due to classpath issues in
bin/mahout vs. where things are placed in the binary release.

On Fri, Jun 10, 2011 at 12:34 PM, Jeff Eastman <[email protected]> wrote:
> I'm still trying to figure out why reuters-0.5 does not work on either of my 
> clusters. The scripts themselves have no diff and the environment variables 
> are set as in trunk except for MAHOUT_HOME. The synthetic control and 20 
> newsgroups examples run on both clusters without problems (well, 20 
> newsgroups has a Version Mismatch error on CDH3, but that is another story). 
> But when I run reuters on 0.5 I see "MAHOUT_LOCAL is set, running locally" 
> followed by file IO exceptions in MahoutDriver that are cluster dependent. 
> When I run it on trunk, I don't see this and it works just fine.
>
> -----Original Message-----
> From: Drew Farris [mailto:[email protected]]
> Sent: Thursday, June 09, 2011 5:36 PM
> To: [email protected]
> Subject: Re: Problems running examples
>
> Jeff, No impuning perceived and thanks for running the variety of
> tests. So it appears that trunk is fine and 0.5 isn't. I'll try to
> determine what (or what didn't) make it into 0.5 that causes it's
> brokenness.
>
> Mark, in the mean time, no need to run all of the tests I've asked
> about previously. Just give trunk a try and see if that resolves your
> problem.
>
> On Thu, Jun 9, 2011 at 7:21 PM, Jeff Eastman <[email protected]> wrote:
>> Hi Drew,
>>
>> Running trunk locally, latest update, just now, build-reuters.sh works 
>> (kmeans and lda).
>>
>> Running trunk on my CDH3 cluster, just now:
>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>> - build-reuters.sh works (with kmeans and lda) Running trunk on my CDH3 
>> cluster:
>>
>> Running trunk on my MapR cluster, just now:
>> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
>> - build-reuters.sh works (with kmeans and lda)
>>
>>
>> Running the 5/31 mahout-distribution-0.5, just now:
>> - build-cluster-syntheticcontrol.sh works (CDH3 & MapR with kmeans and 
>> others)
>> - build-reuters.sh runs in local mode only (CDH3 & MapR runs give different 
>> errors)
>>
>> I was primarily defending kmeans. It is possible my 5/31 0.5 distribution is 
>> not the final one, since everything seems kosher in trunk now. My apology if 
>> I've impuned your patch.
>>
>> Jeff
>>
>>
>> -----Original Message-----
>> From: Drew Farris [mailto:[email protected]]
>> Sent: Thursday, June 09, 2011 11:36 AM
>> To: [email protected]
>> Subject: Re: Problems running examples
>>
>> Jeff,
>>
>> Could you tell me about what's failing in KMeans and LDA when running
>> on a cluster? I had this working just prior to 0.5 in
>> https://issues.apache.org/jira/browse/MAHOUT-694
>>
>> Thanks,
>>
>> Drew
>>
>> On Thu, Jun 9, 2011 at 2:01 PM, Jeff Eastman <[email protected]> wrote:
>>> Ahem, KMeans is not busted. It is being maintained by me, at least. The 
>>> build-reuters.sh script runs only in local mode on 0.5 and fails in both 
>>> KMeans and LDA when run on a cluster. The MIA examples are not always 
>>> correct. Most of this has been reported before.
>>>
>>> -----Original Message-----
>>> From: Sean Owen [mailto:[email protected]]
>>> Sent: Thursday, June 09, 2011 12:29 AM
>>> To: [email protected]
>>> Subject: Re: Problems running examples
>>>
>>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up
>>> before. I don't know if it is being maintained.  Anyone who's willing to
>>> step up and fix it is also welcome to overhaul it IMHO.
>>>
>>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee <[email protected]> wrote:
>>>
>>>> I got a slightly different error on the next line of KMeansDriver.java
>>>> (running on OS X Snow Leopard)
>>>>
>>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor
>>>> Exception in thread "main" java.lang.ClassCastException:
>>>> org.apache.hadoop.io.IntWritable cannot be cast to
>>>> org.apache.mahout.math.VectorWritable
>>>>  at
>>>>
>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
>>>> at
>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
>>>>
>>>>
>>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <[email protected]> wrote:
>>>>
>>>> > IIRC, Reuters used to run on a cluster but no longer does due to some
>>>> > obscure Lucene changes. In 0.5 it only works in local mode. I really hope
>>>> > this can be repaired by 0.6 as Reuters is a key entry point into Mahout
>>>> > clustering for many users.
>>>> >
>>>>
>>>
>>
>

Reply via email to