[
https://issues.apache.org/jira/browse/SOLR-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503744#comment-16503744
]
ASF subversion and git services commented on SOLR-12376:
--------------------------------------------------------
Commit 33b1c1d1416ed3b8dbce4066ad4b982a15e1b0d0 in lucene-solr's branch
refs/heads/branch_7x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=33b1c1d ]
SOLR-12376: AwaitsFix testStopWords pending LUCENE-8344
(cherry picked from commit 7c6d743)
> New TaggerRequestHandler (aka SolrTextTagger)
> ---------------------------------------------
>
> Key: SOLR-12376
> URL: https://issues.apache.org/jira/browse/SOLR-12376
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12376.patch, SOLR-12376.patch, SOLR-12376.patch
>
>
> This issue introduces a new RequestHandler: {{TaggerRequestHandler}}, AKA the
> SolrTextTagger from the OpenSextant project
> [https://github.com/OpenSextant/SolrTextTagger]. It's used for named entity
> recognition (NER) of text past to it. It doesn't do any NLP (outside of
> Lucene text analysis) so it's said to be a "naive tagger", but it's
> definitely useful as-is and a more complete NER or ERD (entity recognition
> and disambiguation) system can be built with this as a key component. The
> SolrTextTagger has been used on queries for query-understanding, and it's
> been used on full-text, and it's been used on dictionaries that number tens
> of millions in size. Since it's small and has been used a bunch (including
> helping win an ERD competition and in [Apache
> Stanbol|https://stanbol.apache.org/]), several people have asked me when or
> why isn't this in Solr yet. So here it is.
> To use it, first you need a collection of documents that have a name-like
> field (short text) indexed with the ConcatenateFilter (LUCENE-8323) at the
> end. We call this the dictionary. Once that's in place, you simply post text
> to a {{TaggerRequestHandler}} and it returns the offset pairs into that text
> for matches in the dictionary along with the uniqueKey of the matching
> documents. It can also return other document data desired. That's the gist;
> I'll add more details on use to the Solr Reference Guide.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]