Hi
A few weeks ago, after some toiling, I managed to get the input data
for the 20 newsgroups example into the format used by the Bayes
classifiers in Mahout. I did this on the trunk, and remember that it
took some tricks in particular to get the PrepareTwentyNewsgroups code
to run on the expanded data and extract/collapse it into the format
used by Mahout's Bayes classifiers.
For some reason now beyond me, I removed that copy of the trunk with
the example data. Now, I'm trying to redo the same (albeit this time
on release 0.2), but am having trouble. I copied the maven/build.xml
into examples/build.xml according to a September post on the user
group (http://old.nabble.com/20-newsgroups-example-td25235941.html).
That post also suggested modifying the file, i.e. taking out the
reference classpath refid="maven.test.classpath"/ (which indeed is not
recognized when I run the extract-20news-18828 ant target), and adding
the following lines:
<classpath>
<path id="lib.path.ref">
<fileset dir="target" includes="*.jar"/>
</path>
<path id="lib.path.ref">
<fileset dir="lib" includes="*.jar"/>
</path>
</classpath>
The "target" one makes some sense, but the lib one does not - I don't
see any lib folder in my mahout-0.2 checkout (even after having done
the mvn install of core and mvn compile of examples). Can anyone
(Robin?) tell me what lines to add instead to get the Ant task to
work? I know I managed to get it working before on my own, but can't
remember for the life of me how I did it :-\
Regards,
Loek