[ 
https://issues.apache.org/jira/browse/OPENNLP-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-755:
-----------------------------------
    Summary: Add support to use a stop word list in query building  (was: Add 
support to use stop word list in query building)

> Add support to use a stop word list in query building
> -----------------------------------------------------
>
>                 Key: OPENNLP-755
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-755
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Entity Linker
>            Reporter: Joern Kottmann
>
> The geocoder in it's current version might create queries which match on 
> terms on which the matching shouldn't happen. These terms could be listed in 
> a stop word list. This stop word list could be used to construct queries 
> which match only the desired terms.
> For example:
> <START> New York City <END> is not in Slovenia
> This currently matches a hotel called "BTC City".
> The index is searched for all terms in the mention. The problem here is if 
> only "City" matches the response will be kind of bad. Or if only "New" and 
> "City" matches.
> Many place names contain the word "City" and that doesn't help much to 
> disambiguate the matches.
> There should be some special logic dealing with stop words.
> The stop words could be removed form the location mention, or better only 
> used for boosting.
> For the case above it could be like this:
> - MUST match York
> - SHOULD match New OR City
> If a name only consists out of stop words e.g. "New City" we could require 
> that the mention only matches the entire name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to