Yep, I'm running trunk.
However, I *think* that when I try to build utils (via mvn -B) it's actually
pulling mahout-core/mahout-test jars from a repo which seems strange to
me...


2010/5/10 Jeff Eastman <j...@windwardsolutions.com>

> Sean posted something about that recently (5/5/10: Re: Installation problem
> in utils) in which he claims to have fixed it. At least, all the tests ran.
> But then, you are running the reuters script and that does not get exercised
> in the build. I suspect there are some more issues with the recent temp file
> allocation patch. Are you running trunk?
>
>
> On 5/10/10 1:43 PM, Florent Empis wrote:
>
>> Hi,
>>
>> It might help for the build part, but probably won't fix the 2nd issue?
>> The / is not writeable on most systems so creation of
>> /tokenized-documents/_temporary
>> will still fail?
>>
>> 2010/5/10 Jeff Eastman<j...@windwardsolutions.com>
>>
>>
>>
>>> Hi Florent,
>>>
>>> I successfully ran the new build-reuters.sh before I committed it this
>>> morning, so I suspect you must have some other problem in your system.
>>> Have
>>> you tried deleting your Maven repository (.m2) and doing a full mvn clean
>>> install?
>>>
>>> Jeff
>>>
>>>
>>> On 5/10/10 12:50 PM, Florent Empis wrote:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> I've seen the commit from Robin this afternoon so I gave it another try.
>>>> Using the new shell I still run into a few problems
>>>> At first, in order to satisfy a dependency to slf4j I've had to add the
>>>> following to examples/pom.xml (once again I'm not a maven expert, so
>>>> this
>>>> may not be the correct way to do it)
>>>>
>>>> <dependency>
>>>>   <groupId>org.slf4j</groupId>
>>>>   <artifactId>slf4j-nop</artifactId>
>>>>   <version>1.5.8</version>
>>>>   <classifier>sources</classifier>
>>>> </dependency>
>>>>
>>>> Then, after a succesful mvn -B
>>>> I've launched the shell:
>>>> flor...@florent-laptop:~/workspace/mahout$
>>>> ./examples/bin/build-reuters.sh
>>>>
>>>> It fails with the following error:
>>>> 10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001
>>>> java.io.IOException: The temporary job-output directory
>>>> file:/tokenized-documents/_temporary doesn't exist!
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662)
>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>> 10/05/10 21:28:07 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001
>>>> 10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0
>>>> 10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with
>>>> args:
>>>> [-i, ./examples/bin/work/reuters-out-seqdir/, -o,
>>>> ./examples/bin/work/reuters-out-seqdir-sparse, null]
>>>> Job failed!
>>>> Exception in thread "main" java.io.IOException: Job failed!
>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>>>> at
>>>>
>>>>
>>>> org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97)
>>>> at
>>>>
>>>>
>>>> org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>>
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> at
>>>>
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
>>>>
>>>> A find makes me think that the issue is
>>>> in
>>>>
>>>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java
>>>>
>>>>
>>>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java:
>>>>  public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER =
>>>> "/tokenized-documents";
>>>>
>>>> I tried changing this value, but it did not solve my problem, although I
>>>> did
>>>> a mvn -B on utils afterwards.... it looks like the mahout-utils used by
>>>> the
>>>> test comes from somewhere else: I guess there's something I'm
>>>> missing....
>>>>
>>>>
>>>>
>>>>
>>>> 2010/5/10 Jeff Eastman<j...@windwardsolutions.com>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> I will commit once I verify it completes.  It's running now...
>>>>> Jeff
>>>>>
>>>>>
>>>>> On 5/10/10 7:50 AM, Robin Anil wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> +1. Should be using bin/mahout script for all these.
>>>>>>
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>> On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman<
>>>>>> j...@windwardsolutions.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Well, thanks for the info. Perhaps we should replace the script then.
>>>>>>> Leaving time bombs around like this is not good.
>>>>>>> Jeff
>>>>>>>
>>>>>>>
>>>>>>> On 5/10/10 7:32 AM, Robin Anil wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> thats been broken for a long time, it was used by David while he
>>>>>>>> developed
>>>>>>>> LDA, It didn't get updated to work post 0.2 . Use Sisir's script to
>>>>>>>> convert
>>>>>>>> reuters to vectors, its up on the wiki
>>>>>>>>
>>>>>>>> Robin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Reply via email to