Yes it was updated shortly. Its here. http://cwiki.apache.org/MAHOUT/twentynewsgroups.html
On Tue, Feb 9, 2010 at 1:48 PM, Loek Cleophas <[email protected]>wrote: > Hi Robin, > > Thank you, that was definitely enough. I ran the PrepareTwentyNewsgroups > task using the mvn exec command you suggested now (seems it's time for me to > read up on Maven - useful how it takes care of finding the includes etc.). > For training and testing, I'm using hadoop directly, which works fine. > > This is probably already on your/someone's to do list, but it might be a > good idea to update the wiki page describing the example, so that it deals > with 0.2 or the trunk vs. some pre 0.2 release version (?). I know, you > probably have enough to work on without that.. > > Regards, > Loek > > > On Feb 7, 2010, at 13:48, Robin Anil wrote: > > Is the mvn exec commands to run 20-newsgroups example enough?. I havent >> used >> the ant for a while(read 8 months), and mahout has shifted to maven >> anyways >> >> So here goes. In examples directory >> >> $ tar zxf 20news-18828.tar.gz >> $ mkdir 20news-input >> $ mvn -e exec:java >> >> -Dexec.mainClass=org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups >> -Dexec.args="-p 20news-18828 -o 20news-input -a >> org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8" >> To Train >> $ mvn -e exec:java >> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier >> -Dexec.args="-i 20news-input -o 20news-model -type cbayes -ng 1 -source >> hdfs" >> To Test >> $ mvn -e exec:java >> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier >> -Dexec.args="-m 20news-model -d 20news-input -type cbayes -ng 1 -source >> hdfs >> -method sequential" >> >> >> >> On Sun, Feb 7, 2010 at 2:26 PM, Loek Cleophas <[email protected] >> >wrote: >> >> Hi >>> >>> A few weeks ago, after some toiling, I managed to get the input data for >>> the 20 newsgroups example into the format used by the Bayes classifiers >>> in >>> Mahout. I did this on the trunk, and remember that it took some tricks in >>> particular to get the PrepareTwentyNewsgroups code to run on the expanded >>> data and extract/collapse it into the format used by Mahout's Bayes >>> classifiers. >>> >>> For some reason now beyond me, I removed that copy of the trunk with the >>> example data. Now, I'm trying to redo the same (albeit this time on >>> release >>> 0.2), but am having trouble. I copied the maven/build.xml into >>> examples/build.xml according to a September post on the user group ( >>> http://old.nabble.com/20-newsgroups-example-td25235941.html). That post >>> also suggested modifying the file, i.e. taking out the reference >>> classpath >>> refid="maven.test.classpath"/ (which indeed is not recognized when I run >>> the >>> extract-20news-18828 ant target), and adding the following lines: >>> >>> <classpath> >>> <path id="lib.path.ref"> >>> <fileset dir="target" includes="*.jar"/> >>> </path> >>> <path id="lib.path.ref"> >>> <fileset dir="lib" includes="*.jar"/> >>> </path> >>> </classpath> >>> >>> The "target" one makes some sense, but the lib one does not - I don't see >>> any lib folder in my mahout-0.2 checkout (even after having done the mvn >>> install of core and mvn compile of examples). Can anyone (Robin?) tell me >>> what lines to add instead to get the Ant task to work? I know I managed >>> to >>> get it working before on my own, but can't remember for the life of me >>> how I >>> did it :-\ >>> >>> Regards, >>> Loek >>> >> >
