[ 
https://issues.apache.org/jira/browse/MAHOUT-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011239#comment-14011239
 ] 

Andrew Palumbo commented on MAHOUT-1564:
----------------------------------------

I just had a closer look at MAHOUT-1252 and see that it covers this.  I'll 
probably go ahead with this as it is really a minor subset of 1252, and i have 
a little work in on it already.  Please give me a yell if this looks like 
something that would be vetoed outright.  Thanks.  

> Naive Bayes Classifier for New Text Documents
> ---------------------------------------------
>
>                 Key: MAHOUT-1564
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1564
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.9
>            Reporter: Andrew Palumbo
>             Fix For: 1.0
>
>
> MapReduce Naive Bayes implementation currently lacks the ability to classify 
> a new document (outside of the training/holdout corpus).  I've begun some 
> work on a "ClassifyNew" job which will do the following:
> 1. Vectorize a new text document using the dictionary and document 
> frequencies from the training/holdout corpus 
>     - assume the original corpus was vectorized using `seq2sparse`; step (1) 
> will use all of the same parameters. 
> 2. Score and label a new document using a previously trained model.
> I think that it will be a useful addition to the NB package.  Unfortunately, 
> this is going to be mostly MR workhorse code and doesn't really introduce 
> much new logic. I will try to keep any new logic separate from MR code so 
> that it can be called from scala for MAHOUT-1493.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to