[ 
https://issues.apache.org/jira/browse/OPENNLP-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413210#comment-16413210
 ] 

Xiang Zhang commented on OPENNLP-755:
-------------------------------------

Hi, is there any clarification on how can I start to solve this?

> Add support to use a stop word list in query building
> -----------------------------------------------------
>
>                 Key: OPENNLP-755
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-755
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Entity Linker
>            Reporter: Joern Kottmann
>            Priority: Major
>
> The geocoder in it's current version might create queries which match on 
> terms on which the matching shouldn't happen. These terms could be listed in 
> a stop word list. This stop word list could be used to construct queries 
> which match only the desired terms.
> For example:
> <START> New York City <END> is not in Slovenia
> This currently matches a hotel called "BTC City".
> The index is searched for all terms in the mention. The problem here is if 
> only "City" matches the response will be kind of bad. Or if only "New" and 
> "City" matches.
> Many place names contain the word "City" and that doesn't help much to 
> disambiguate the matches.
> There should be some special logic dealing with stop words.
> The stop words could be removed form the location mention, or better only 
> used for boosting.
> For the case above it could be like this:
> - MUST match York
> - SHOULD match New OR City
> If a name only consists out of stop words e.g. "New City" we could require 
> that the mention only matches the entire name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to