[
https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-92:
-----------------------------
Attachment: MAHOUT-92.patch
{quote}
to test
hadoop jar build/apache-mahout-examples-0.1-dev.job
org.apache.mahout.classifier.bayes.TestClassifier -p 20newsmodel -t
../core/work/20news-18828-collapse/ -ng 1 -type cbayes -a
org.apache.lucene.analysis.standard.StandardAnalyzer -d default -e UTF-8
{/quote}
Some lines were missing in very last patch I submitted in MAHOUT-60.
BayesFeatureMapper wasnt creating any output.
This patch fixes that. I also ran a train and a test over 20 newsgroups.
Everything seems working at the moment.
Why is encoding and analyzer a required option in the command line?
I feel it should be optional.
The same goes for the default category. The classifier returns the first
category if all the categories have same score or zero. I don't see any
problem in that.
Any thoughts?
> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
> Key: MAHOUT-92
> URL: https://issues.apache.org/jira/browse/MAHOUT-92
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't
> actually do anything. The problem is it is not using the input value to
> generate a set of n-grams, from which it can then generate tf-idf information.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.