reuse lucene tokenstreams
-------------------------
Key: MAHOUT-706
URL: https://issues.apache.org/jira/browse/MAHOUT-706
Project: Mahout
Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
Attachments: MAHOUT-706.patch
Currently, mahout uses Lucene's non-reusable analysis API.
This means that per-"document", a lot of objects are recreated (e.g. every
TokenStream in the analysis chain, every Attribute).
This can create a lot of unnecessary overhead, particularly if "documents" are
short.
It looks like an easy win to use the reusable API (reusableTokenStream) instead.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira