Very cool, looks like I will have some time on Friday to review your's
(and Deneche's) changes and hopefully get them committed, and then we
should be able to push out a release and start collecting feedback.
-Grant
On Aug 12, 2008, at 8:11 PM, Robin Anil (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-60:
-----------------------------
Attachment: MAHOUT-60-13082008.patch
I have merged the BayesClassifier and CBayesClassifier. Now both use
some common Map reduce operation. The specific Map-Reduce operations
are factored out.
The Model is also factored out.
The new feature in this patch is a n-gram generator using the cli
parameter -ng <gram-size>
If a model is made using a 3-gram then you can use 1/2/3 gram to
classify.
Try increasing n-gram and see how the classification accuracy grow
with it.
cbayes.TestTwentyNewsgroups is renamed to bayes.TestClassifier
cbayes.TrainTwentyNewsgrousp is renamed to bayes.TrainClassifier
The Tests will fail when using this patch. So dont worry. New tests
will be put up shortly.
{noformat}
//To Train a Bayes Classifier using tri-grams
hadoop jar build/apache-mahout-0.1-dev-ex.jar
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i
newstrain -o newsmodel -ng 3 -type bayes
//To Test a Bayes Classifier using tri-grams
hadoop jar build/apache-mahout-0.1-dev-ex.jar
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p
newsmodel -t work/newstest -ng 3 -type bayes
//To Train a CBayes Classifier using bi-grams
hadoop jar build/apache-mahout-0.1-dev-ex.jar
org.apache.mahout.examples.classifiers.bayes.TrainClassifier -t -i
newstrain -o newsmodel -ng 2 -type bayes
//To Test a CBayes Classifier using bi-grams
hadoop jar build/apache-mahout-0.1-dev-ex.jar
org.apache.mahout.examples.classifiers.bayes.TestClassifier -p
newsmodel -t work/newstest -ng 2 -type cbayes
{noformat}
Hope you will enjoy using this patch.