[ 
https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644534#action_12644534
 ] 

Grant Ingersoll commented on MAHOUT-92:
---------------------------------------

{quote}
I also ran a train and a test over 20 newsgroups. Everything seems working at 
the moment.
{quote}

Can you share how you are running it?  When I run it, it completes, but all the 
results are "unknown".  Please update 
http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups if you have 
the time.

I'm looking at the BayesClassifier class, and I frankly don't get how it works 
anymore, especially the code at:
{code}
for (String category : categories) {
      double prob = documentProbability(model, category, document);
      if (prob < min) {
        min = prob;
        result.setLabel(category);
      }
    }
{code}

That min value starts at 0, and a probability should be between 0 and 1, how 
would that clause ever be satisfied such that the label gets set?  
Additionally, the values that come back for prob are much larger than one.  
That's fine if they are supposed to be, but then we shouldn't be calling it a 
probability.


> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
>                 Key: MAHOUT-92
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-92
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-92.patch, MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't 
> actually do anything.  The problem is it is not using the input value to 
> generate a set of n-grams, from which it can then generate tf-idf information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to