[
https://issues.apache.org/jira/browse/MAHOUT-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-521:
------------------------------
Attachment: MAHOUT-vectorizer-move.patch
Moving entire dictionary vectorizer classes, mapreduces and tests to
core/.../vectorizer/
> Add option to DictionaryVectorizer to create (tf and tfidf) vectors
> on-the-fly using a given dictionary
> --------------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-521
> URL: https://issues.apache.org/jira/browse/MAHOUT-521
> Project: Mahout
> Issue Type: New Feature
> Reporter: Robin Anil
> Assignee: Robin Anil
> Fix For: 0.4
>
> Attachments: MAHOUT-vectorizer-move.patch
>
>
> Current dictionary vectorizer takes a set of text-files, creates the
> dictionary and convert them to text vectors. In a classification scenario,
> the vectorizer needs to take a Already existing dictionary and use the ids to
> convert text to vectors and optionally do the following
> 1. Choose between tf|tfidf weights (need to take the document frequency as an
> input for this)
> 2. Add new words to the dictionary and provide options to write it to the
> disk and read it back
> 3. Add option to normalize/lognormalize
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.