[
https://issues.apache.org/jira/browse/MAHOUT-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732435#action_12732435
]
Robin Anil commented on MAHOUT-146:
-----------------------------------
Its already generic to some extend. Check out Mahout-60
The usage was to create a dataset for BayesClassifier like this.
{noformat}
hadoop jar build/apache-mahout-0.1-dev-ex.jar
org.apache.mahout.examples.classifiers.cbayes.WikipediaDatasetCreator -i
wikipediadump -o wikipediainput -c pathto/country.txt
{noformat}
-c is the file with the list of categories(wikipedia categories). So you could
specify any thing there. But it has to be a wikipedia category. When i go
thought the xml dump, in the Map stage, for every article I match the list of
categories with categories the document is in and output it if a match occurs
for example, you could create 2 categories by adding the following in the
categories file
Scientists of 1900
Scientists of 2000
> Make Wikipedia Example Classifier more generic
> ----------------------------------------------
>
> Key: MAHOUT-146
> URL: https://issues.apache.org/jira/browse/MAHOUT-146
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.2
>
>
> It would be nice if the Wikipedia classifier example was a bit more generic
> instead of taking just countries. For example, one could classify based on
> other types of categories, such as things like "subjects", i.e. History,
> Math, Science or other things.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.