[jira] [Created] (LUCENE-8323) New ConcatenateFilter, a TokenFilter to concat/join tokens

David Smiley (JIRA) Fri, 18 May 2018 14:15:16 -0700

David Smiley created LUCENE-8323:
------------------------------------

             Summary: New ConcatenateFilter, a TokenFilter to concat/join tokens
                 Key: LUCENE-8323
                 URL: https://issues.apache.org/jira/browse/LUCENE-8323
             Project: Lucene - Core
          Issue Type: New Feature
          Components: modules/analysis
            Reporter: David Smiley
            Assignee: David Smiley



Here I introduce the ConcatenateFilter (with Factory) to concatenate/join 
tokens with a provided separator to produce one final token.  It's similar to 
FingerprintFilter but doesn't deduplicate or sort.  It's useful for doing 
exact-ish search on short text (think names or titles) with simple analysis.  
At this task, its faster than a PhraseQuery equivalent, and solves the issue of 
matching completely and not a portion of the tokens.  It's also useful for 
using Lucene to hold a dictionary of short names/phrases for entity-extraction 
(aka text tagging).  The OpenSextant SolrTextTagger uses it for this purpose, 
which is where I'm taking it from.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-8323) New ConcatenateFilter, a TokenFilter to concat/join tokens

Reply via email to