Joern Kottmann created OPENNLP-755:
--------------------------------------

             Summary: Add support to use stop word list in query building
                 Key: OPENNLP-755
                 URL: https://issues.apache.org/jira/browse/OPENNLP-755
             Project: OpenNLP
          Issue Type: Improvement
          Components: Entity Linker
            Reporter: Joern Kottmann


The geocoder in it's current version might create queries which match on terms 
on which the matching shouldn't happen. These terms could be listed in a stop 
word list. This stop word list could be used to construct queries which match 
only the desired terms.

For example:
<START> New York City <END> is not in Slovenia

This currently matches a hotel called "BTC City".

The index is searched for all terms in the mention. The problem here is if only 
"City" matches the response will be kind of bad. Or if only "New" and "City" 
matches.

Many place names contain the word "City" and that doesn't help much to 
disambiguate the matches.

There should be some special logic dealing with stop words.
The stop words could be removed form the location mention, or better only used 
for boosting.

For the case above it could be like this:
- MUST match York
- SHOULD match New OR City

If a name only consists out of stop words e.g. "New City" we could require that 
the mention only matches the entire name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to