Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly 
using a given dictionary 
--------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-521
                 URL: https://issues.apache.org/jira/browse/MAHOUT-521
             Project: Mahout
          Issue Type: New Feature
            Reporter: Robin Anil
            Assignee: Robin Anil
             Fix For: 0.4


Current dictionary vectorizer takes a set of text-files, creates the dictionary 
and convert them to text vectors. In a classification scenario, the vectorizer 
needs to take a Already existing dictionary and use the ids to convert text to 
vectors and optionally do the following

1. Choose between tf|tfidf weights (need to take the document frequency as an 
input for this)
2. Add new words to the dictionary and provide options to write it to the disk 
and read it back
3. Add option to normalize/lognormalize 



 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to