[
https://issues.apache.org/jira/browse/OPENNLP-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joern Kottmann updated OPENNLP-755:
-----------------------------------
Summary: Add support to use a stop word list in query building (was: Add
support to use stop word list in query building)
> Add support to use a stop word list in query building
> -----------------------------------------------------
>
> Key: OPENNLP-755
> URL: https://issues.apache.org/jira/browse/OPENNLP-755
> Project: OpenNLP
> Issue Type: Improvement
> Components: Entity Linker
> Reporter: Joern Kottmann
>
> The geocoder in it's current version might create queries which match on
> terms on which the matching shouldn't happen. These terms could be listed in
> a stop word list. This stop word list could be used to construct queries
> which match only the desired terms.
> For example:
> <START> New York City <END> is not in Slovenia
> This currently matches a hotel called "BTC City".
> The index is searched for all terms in the mention. The problem here is if
> only "City" matches the response will be kind of bad. Or if only "New" and
> "City" matches.
> Many place names contain the word "City" and that doesn't help much to
> disambiguate the matches.
> There should be some special logic dealing with stop words.
> The stop words could be removed form the location mention, or better only
> used for boosting.
> For the case above it could be like this:
> - MUST match York
> - SHOULD match New OR City
> If a name only consists out of stop words e.g. "New City" we could require
> that the mention only matches the entire name.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)