Hi,

It might help for the build part, but probably won't fix the 2nd issue?
The / is not writeable on most systems so creation of
/tokenized-documents/_temporary
will still fail?

2010/5/10 Jeff Eastman <j...@windwardsolutions.com>

> Hi Florent,
>
> I successfully ran the new build-reuters.sh before I committed it this
> morning, so I suspect you must have some other problem in your system. Have
> you tried deleting your Maven repository (.m2) and doing a full mvn clean
> install?
>
> Jeff
>
>
> On 5/10/10 12:50 PM, Florent Empis wrote:
>
>> Hi,
>>
>> I've seen the commit from Robin this afternoon so I gave it another try.
>> Using the new shell I still run into a few problems
>> At first, in order to satisfy a dependency to slf4j I've had to add the
>> following to examples/pom.xml (once again I'm not a maven expert, so this
>> may not be the correct way to do it)
>>
>> <dependency>
>>   <groupId>org.slf4j</groupId>
>>   <artifactId>slf4j-nop</artifactId>
>>   <version>1.5.8</version>
>>   <classifier>sources</classifier>
>> </dependency>
>>
>> Then, after a succesful mvn -B
>> I've launched the shell:
>> flor...@florent-laptop:~/workspace/mahout$
>> ./examples/bin/build-reuters.sh
>>
>> It fails with the following error:
>> 10/05/10 21:28:06 WARN mapred.LocalJobRunner: job_local_0001
>> java.io.IOException: The temporary job-output directory
>> file:/tokenized-documents/_temporary doesn't exist!
>> at
>>
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:204)
>> at
>>
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:234)
>> at
>>
>> org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:48)
>> at
>>
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:662)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:352)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 10/05/10 21:28:07 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/05/10 21:28:07 INFO mapred.JobClient: Job complete: job_local_0001
>> 10/05/10 21:28:07 INFO mapred.JobClient: Counters: 0
>> 10/05/10 21:28:07 ERROR driver.MahoutDriver: MahoutDriver failed with
>> args:
>> [-i, ./examples/bin/work/reuters-out-seqdir/, -o,
>> ./examples/bin/work/reuters-out-seqdir-sparse, null]
>> Job failed!
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>> at
>>
>> org.apache.mahout.utils.vectors.text.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:97)
>> at
>>
>> org.apache.mahout.text.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:215)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
>>
>> A find makes me think that the issue is
>> in
>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java
>>
>> /utils/src/main/java/org/apache/mahout/utils/vectors/text/DocumentProcessor.java:
>>  public static final String TOKENIZED_DOCUMENT_OUTPUT_FOLDER =
>> "/tokenized-documents";
>>
>> I tried changing this value, but it did not solve my problem, although I
>> did
>> a mvn -B on utils afterwards.... it looks like the mahout-utils used by
>> the
>> test comes from somewhere else: I guess there's something I'm missing....
>>
>>
>>
>>
>> 2010/5/10 Jeff Eastman<j...@windwardsolutions.com>
>>
>>
>>
>>> I will commit once I verify it completes.  It's running now...
>>> Jeff
>>>
>>>
>>> On 5/10/10 7:50 AM, Robin Anil wrote:
>>>
>>>
>>>
>>>> +1. Should be using bin/mahout script for all these.
>>>>
>>>>
>>>> Robin
>>>>
>>>>
>>>> On Mon, May 10, 2010 at 8:12 PM, Jeff Eastman<
>>>> j...@windwardsolutions.com
>>>>
>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Well, thanks for the info. Perhaps we should replace the script then.
>>>>> Leaving time bombs around like this is not good.
>>>>> Jeff
>>>>>
>>>>>
>>>>> On 5/10/10 7:32 AM, Robin Anil wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> thats been broken for a long time, it was used by David while he
>>>>>> developed
>>>>>> LDA, It didn't get updated to work post 0.2 . Use Sisir's script to
>>>>>> convert
>>>>>> reuters to vectors, its up on the wiki
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Reply via email to