Is the mvn exec commands to run 20-newsgroups example enough?. I havent used
the ant for a while(read 8 months), and mahout has shifted to maven anyways

So here goes. In examples directory

$ tar zxf 20news-18828.tar.gz
$ mkdir 20news-input
$ mvn -e  exec:java
-Dexec.mainClass=org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
-Dexec.args="-p 20news-18828 -o 20news-input -a
org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8"
To Train
$ mvn -e  exec:java
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier
-Dexec.args="-i 20news-input -o 20news-model -type cbayes -ng 1 -source
hdfs"
To Test
$ mvn -e  exec:java
-Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier
-Dexec.args="-m 20news-model -d 20news-input -type cbayes -ng 1 -source hdfs
-method sequential"



On Sun, Feb 7, 2010 at 2:26 PM, Loek Cleophas <[email protected]>wrote:

> Hi
>
> A few weeks ago, after some toiling, I managed to get the input data for
> the 20 newsgroups example into the format used by the Bayes classifiers in
> Mahout. I did this on the trunk, and remember that it took some tricks in
> particular to get the PrepareTwentyNewsgroups code to run on the expanded
> data and extract/collapse it into the format used by Mahout's Bayes
> classifiers.
>
> For some reason now beyond me, I removed that copy of the trunk with the
> example data. Now, I'm trying to redo the same (albeit this time on release
> 0.2), but am having trouble. I copied the maven/build.xml into
> examples/build.xml according to a September post on the user group (
> http://old.nabble.com/20-newsgroups-example-td25235941.html). That post
> also suggested modifying the file, i.e. taking out the reference classpath
> refid="maven.test.classpath"/ (which indeed is not recognized when I run the
> extract-20news-18828 ant target), and adding the following lines:
>
>      <classpath>
>          <path id="lib.path.ref">
>            <fileset dir="target" includes="*.jar"/>
>          </path>
>          <path id="lib.path.ref">
>            <fileset dir="lib" includes="*.jar"/>
>          </path>
>      </classpath>
>
> The "target" one makes some sense, but the lib one does not - I don't see
> any lib folder in my mahout-0.2 checkout (even after having done the mvn
> install of core and mvn compile of examples). Can anyone (Robin?) tell me
> what lines to add instead to get the Ant task to work? I know I managed to
> get it working before on my own, but can't remember for the life of me how I
> did it :-\
>
> Regards,
> Loek

Reply via email to