Yes it was updated shortly. Its here.
http://cwiki.apache.org/MAHOUT/twentynewsgroups.html



On Tue, Feb 9, 2010 at 1:48 PM, Loek Cleophas <[email protected]>wrote:

> Hi Robin,
>
> Thank you, that was definitely enough. I ran the PrepareTwentyNewsgroups
> task using the mvn exec command you suggested now (seems it's time for me to
> read up on Maven - useful how it takes care of finding the includes etc.).
> For training and testing, I'm using hadoop directly, which works fine.
>
> This is probably already on your/someone's to do list, but it might be a
> good idea to update the wiki page describing the example, so that it deals
> with 0.2 or the trunk vs. some pre 0.2 release version (?). I know, you
> probably have enough to work on without that..
>
> Regards,
> Loek
>
>
> On Feb 7, 2010, at 13:48, Robin Anil wrote:
>
>  Is the mvn exec commands to run 20-newsgroups example enough?. I havent
>> used
>> the ant for a while(read 8 months), and mahout has shifted to maven
>> anyways
>>
>> So here goes. In examples directory
>>
>> $ tar zxf 20news-18828.tar.gz
>> $ mkdir 20news-input
>> $ mvn -e  exec:java
>>
>> -Dexec.mainClass=org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
>> -Dexec.args="-p 20news-18828 -o 20news-input -a
>> org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8"
>> To Train
>> $ mvn -e  exec:java
>> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier
>> -Dexec.args="-i 20news-input -o 20news-model -type cbayes -ng 1 -source
>> hdfs"
>> To Test
>> $ mvn -e  exec:java
>> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier
>> -Dexec.args="-m 20news-model -d 20news-input -type cbayes -ng 1 -source
>> hdfs
>> -method sequential"
>>
>>
>>
>> On Sun, Feb 7, 2010 at 2:26 PM, Loek Cleophas <[email protected]
>> >wrote:
>>
>>  Hi
>>>
>>> A few weeks ago, after some toiling, I managed to get the input data for
>>> the 20 newsgroups example into the format used by the Bayes classifiers
>>> in
>>> Mahout. I did this on the trunk, and remember that it took some tricks in
>>> particular to get the PrepareTwentyNewsgroups code to run on the expanded
>>> data and extract/collapse it into the format used by Mahout's Bayes
>>> classifiers.
>>>
>>> For some reason now beyond me, I removed that copy of the trunk with the
>>> example data. Now, I'm trying to redo the same (albeit this time on
>>> release
>>> 0.2), but am having trouble. I copied the maven/build.xml into
>>> examples/build.xml according to a September post on the user group (
>>> http://old.nabble.com/20-newsgroups-example-td25235941.html). That post
>>> also suggested modifying the file, i.e. taking out the reference
>>> classpath
>>> refid="maven.test.classpath"/ (which indeed is not recognized when I run
>>> the
>>> extract-20news-18828 ant target), and adding the following lines:
>>>
>>>    <classpath>
>>>        <path id="lib.path.ref">
>>>          <fileset dir="target" includes="*.jar"/>
>>>        </path>
>>>        <path id="lib.path.ref">
>>>          <fileset dir="lib" includes="*.jar"/>
>>>        </path>
>>>    </classpath>
>>>
>>> The "target" one makes some sense, but the lib one does not - I don't see
>>> any lib folder in my mahout-0.2 checkout (even after having done the mvn
>>> install of core and mvn compile of examples). Can anyone (Robin?) tell me
>>> what lines to add instead to get the Ant task to work? I know I managed
>>> to
>>> get it working before on my own, but can't remember for the life of me
>>> how I
>>> did it :-\
>>>
>>> Regards,
>>> Loek
>>>
>>
>

Reply via email to