David Smiley created LUCENE-8323:
------------------------------------
Summary: New ConcatenateFilter, a TokenFilter to concat/join tokens
Key: LUCENE-8323
URL: https://issues.apache.org/jira/browse/LUCENE-8323
Project: Lucene - Core
Issue Type: New Feature
Components: modules/analysis
Reporter: David Smiley
Assignee: David Smiley
Here I introduce the ConcatenateFilter (with Factory) to concatenate/join
tokens with a provided separator to produce one final token. It's similar to
FingerprintFilter but doesn't deduplicate or sort. It's useful for doing
exact-ish search on short text (think names or titles) with simple analysis.
At this task, its faster than a PhraseQuery equivalent, and solves the issue of
matching completely and not a portion of the tokens. It's also useful for
using Lucene to hold a dictionary of short names/phrases for entity-extraction
(aka text tagging). The OpenSextant SolrTextTagger uses it for this purpose,
which is where I'm taking it from.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]