Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly
using a given dictionary
--------------------------------------------------------------------------------------------------------
Key: MAHOUT-521
URL: https://issues.apache.org/jira/browse/MAHOUT-521
Project: Mahout
Issue Type: New Feature
Reporter: Robin Anil
Assignee: Robin Anil
Fix For: 0.4
Current dictionary vectorizer takes a set of text-files, creates the dictionary
and convert them to text vectors. In a classification scenario, the vectorizer
needs to take a Already existing dictionary and use the ids to convert text to
vectors and optionally do the following
1. Choose between tf|tfidf weights (need to take the document frequency as an
input for this)
2. Add new words to the dictionary and provide options to write it to the disk
and read it back
3. Add option to normalize/lognormalize
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.