Yep, I'm running trunk. However, I *think* that when I try to build utils (via mvn -B) it's actually pulling mahout-core/mahout-test jars from a repo which seems strange to me...
2010/5/10 Jeff Eastman <j...@windwardsolutions.com> > Sean posted something about that recently (5/5/10: Re: Installation problem > in utils) in which he claims to have fixed it. At least, all the tests ran. > But then, you are running the reuters script and that does not get exercised > in the build. I suspect there are some more issues with the recent temp file > allocation patch. Are you running trunk? > > > On 5/10/10 1:43 PM, Florent Empis wrote: > >> Hi, >> >> It might help for the build part, but probably won't fix the 2nd issue? >> The / is not writeable on most systems so creation of >> /tokenized-documents/_temporary >> will still fail? >> >> 2010/5/10 Jeff Eastman<j...@windwardsolutions.com> >> >> >> >>> Hi Florent, >>> >>> I successfully ran the new build-reuters.sh before I committed it this >>> morning, so I suspect you must have some other problem in your system. >>> Have >>> you tried deleting your Maven repository (.m2) and doing a full mvn clean >>> install? >>> >>> Jeff >>> >>> >>> On 5/10/10 12:50 PM, Florent Empis wrote: >>> >>> >>> >>>> Hi, >>>> >>>> I've seen the commit from Robin this afternoon so I gave it another try. >>>> Using the new shell I still run into a few problems >>>> At first, in order to satisfy a dependency to slf4j I've had to add the >>>> following to examples/pom.xml (once again I'm not a maven expert, so >>>> this >>>> may not be the correct way to do it) >>>> >>>> <dependency> >>>> <groupId>org.slf4j</groupId> >>>> <artifactId>slf4j-nop</artifactId> >>>> <version>1.5.8</version> >>>> <classifier>sources</classifier> >>>> </dependency> >>>> >>>> Then, after a succesful mvn -B >>>> I've launched the shell: >>>> flor...@florent-laptop:~/workspace/mahout$ >>>> ./examples/bin/build-reuters.sh >>>> >>>> It fails with the following error: >>>> 10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001 >>>> java.io.IOException: The temporary job-output directory >>>> file:/tokenized-documents/_temporary doesn't exist! >>>> at >>>> >>>> >>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204) >>>> at >>>> >>>> >>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234) >>>> at >>>> >>>> >>>> org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48) >>>> at >>>> >>>> >>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662) >>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >>>> at >>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>> 10/05/10 21:28:07 INFO mapred.JobClient: map 0% reduce 0% >>>> 10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001 >>>> 10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0 >>>> 10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with >>>> args: >>>> [-i, ./examples/bin/work/reuters-out-seqdir/, -o, >>>> ./examples/bin/work/reuters-out-seqdir-sparse, null] >>>> Job failed! >>>> Exception in thread "main" java.io.IOException: Job failed! >>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) >>>> at >>>> >>>> >>>> org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97) >>>> at >>>> >>>> >>>> org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> >>>> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>> at >>>> >>>> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>> at >>>> >>>> >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >>>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >>>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172) >>>> >>>> A find makes me think that the issue is >>>> in >>>> >>>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java >>>> >>>> >>>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java: >>>> public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER = >>>> "/tokenized-documents"; >>>> >>>> I tried changing this value, but it did not solve my problem, although I >>>> did >>>> a mvn -B on utils afterwards.... it looks like the mahout-utils used by >>>> the >>>> test comes from somewhere else: I guess there's something I'm >>>> missing.... >>>> >>>> >>>> >>>> >>>> 2010/5/10 Jeff Eastman<j...@windwardsolutions.com> >>>> >>>> >>>> >>>> >>>> >>>>> I will commit once I verify it completes. It's running now... >>>>> Jeff >>>>> >>>>> >>>>> On 5/10/10 7:50 AM, Robin Anil wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> +1. Should be using bin/mahout script for all these. >>>>>> >>>>>> >>>>>> Robin >>>>>> >>>>>> >>>>>> On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman< >>>>>> j...@windwardsolutions.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Well, thanks for the info. Perhaps we should replace the script then. >>>>>>> Leaving time bombs around like this is not good. >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> On 5/10/10 7:32 AM, Robin Anil wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> thats been broken for a long time, it was used by David while he >>>>>>>> developed >>>>>>>> LDA, It didn't get updated to work post 0.2 . Use Sisir's script to >>>>>>>> convert >>>>>>>> reuters to vectors, its up on the wiki >>>>>>>> >>>>>>>> Robin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >