[
https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644534#action_12644534
]
Grant Ingersoll commented on MAHOUT-92:
---------------------------------------
{quote}
I also ran a train and a test over 20 newsgroups. Everything seems working at
the moment.
{quote}
Can you share how you are running it? When I run it, it completes, but all the
results are "unknown". Please update
http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups if you have
the time.
I'm looking at the BayesClassifier class, and I frankly don't get how it works
anymore, especially the code at:
{code}
for (String category : categories) {
double prob = documentProbability(model, category, document);
if (prob < min) {
min = prob;
result.setLabel(category);
}
}
{code}
That min value starts at 0, and a probability should be between 0 and 1, how
would that clause ever be satisfied such that the label gets set?
Additionally, the values that come back for prob are much larger than one.
That's fine if they are supposed to be, but then we shouldn't be calling it a
probability.
> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
> Key: MAHOUT-92
> URL: https://issues.apache.org/jira/browse/MAHOUT-92
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: MAHOUT-92.patch, MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't
> actually do anything. The problem is it is not using the input value to
> generate a set of n-grams, from which it can then generate tf-idf information.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.