[
https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644507#action_12644507
]
Grant Ingersoll commented on MAHOUT-92:
---------------------------------------
Cool, this looks almost exactly like the patch I came up with based on looking
at some of the old patches.
{code}
Why is encoding and analyzer a required option in the command line?
{code}
In my original patch, I believe it started off by tokenizing/filtering the text
using any specified Lucene Analyzer. I think this piece would be useful to
restore. This way, you aren't just relying on a simple whitespace tokenizer
and can plug in your own very easily.
{code}
The same goes for the default category. The classifier returns the first
category if all the categories have same score or zero. I don't see any problem
in that.
{code}
The default category covered the case where there isn't sufficient evidence for
a category.
> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
> Key: MAHOUT-92
> URL: https://issues.apache.org/jira/browse/MAHOUT-92
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 0.1
>
> Attachments: MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't
> actually do anything. The problem is it is not using the input value to
> generate a set of n-grams, from which it can then generate tf-idf information.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.