[jira] [Commented] (MAHOUT-1564) Naive Bayes Classifier for New Text Documents

Hudson (JIRA) Wed, 01 Apr 2015 15:01:23 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391560#comment-14391560
 ]


Hudson commented on MAHOUT-1564:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #3038 (See 
[https://builds.apache.org/job/Mahout-Quality/3038/])
MAHOUT-1564: Naive Bayes Classifier for New Text Documents closes 
apache/mahout#91 (apalumbo: rev 441460e77cd38acc684cb2351dad5f0e6156c1f0)
* examples/bin/spark-document-classifier.mscala
* CHANGELOG


> Naive Bayes Classifier for New Text Documents
> ---------------------------------------------
>
>                 Key: MAHOUT-1564
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1564
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.9
>            Reporter: Andrew Palumbo
>            Assignee: Andrew Palumbo
>              Labels: DSL, legacy, scala, spark
>             Fix For: 0.10.1, 0.10.0
>
>
> MapReduce and DSL Naive Bayes implementations currently lack the ability to 
> classify a new document (outside of the training/holdout corpus).  This New 
> feature will do the following.
> 1. Vectorize a new text document using the dictionary and document 
> frequencies from the training/holdout corpus 
>     - assume the original corpus was vectorized using `seq2sparse`; step (1) 
> will use all of the same parameters. 
> 2. Score and label a new document using a previously trained model.
> This effort will need to be done in parallel for MRLegacy and DSL 
> implementations.  Neither should be too much work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAHOUT-1564) Naive Bayes Classifier for New Text Documents

Reply via email to