Lucene filter for Collocations
------------------------------
Key: MAHOUT-415
URL: https://issues.apache.org/jira/browse/MAHOUT-415
Project: Mahout
Issue Type: New Feature
Affects Versions: 0.3
Reporter: Drew Farris
Assignee: Drew Farris
Collocations generated using Mahout could be used to form a whitelist of terms
to index into a Lucene index. This patch will provide a way to generate a
serialized BloomFilter from CollocationsOutput and a Lucene filter that will
take a BloomFilter and emit tokens that are members of that filter. This would
allow a set of interesting collocations to be pre-computed for a corpus and
then allow the documents to be indexed using only those collocations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.