Hmm, that's weird. I might suggest doing "clean install" first as well as
deleting your examples/bin/work directory.
On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
> I uncommented line 39 and am getting the same errors (index error with
> kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 &
> MapR). Trying to run locally, I get this curious output. I don't have much
> time today to pursue it (in meetings all day) but will do my best:
>
> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> Downloading Reuters-21578
> % Total % Received % Xferd Average Speed Time Time Time Current
> Dload Upload Total Spent Left Speed
> 100 7959k 100 7959k 0 0 1145k 0 0:00:06 0:00:06 --:--:-- 1135k
> Extracting...
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on
> classpath, will use command-line arguments only
> Deleting all files in ./examples/bin/work/reuters-out
> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 4690 ms
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5,
> --endPhase=2147483647,
> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter,
> --input=./examples/bin/work/reuters-out/, --keyPrefix=,
> --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0,
> --tempDir=temp}
> Exception in thread "main" java.lang.IllegalStateException:
> java.io.FileNotFoundException:
> /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt
> (Too many open files)
> at
> org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> C
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:[email protected]]
> Sent: Friday, May 20, 2011 11:17 AM
> To: [email protected]
> Subject: Re: Is LDA Broken?
>
> yeah, sorry. I commented out line 39: cd examples/bin
>
> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>
>> It does seem these two symptoms are of the same problem. I applied the
>> patch; however, and now neither option runs. It appears the cd is off but I
>> can't see where.
>>
>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or
>> directory
>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or
>> directory
>>
>>
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:[email protected]]
>> Sent: Friday, May 20, 2011 10:50 AM
>> To: [email protected]
>> Subject: Re: Is LDA Broken?
>>
>> Likely so, see MAHOUT-694.
>>
>>
>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>
>>> Oh sorry these are the same issue? Great!
>>> On May 20, 2011 5:44 PM, "Jake Mannix" <[email protected]> wrote:
>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>
>>>> -jake
>>>>
>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <[email protected]> wrote:
>>>>
>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>> thing
>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>> will
>>>>> pile in to look at it.
>>>>>
>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <[email protected]> wrote:
>>>>>
>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>> broken to me.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jeff Eastman [mailto:[email protected]]
>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>> To: [email protected]
>>>>>> Subject: Is LDA Broken?
>>>>>>
>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>> (20)
>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>> behavior
>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>
>>>>>>
>>>>>
>>
>>
>
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org