Joern Kottmann created OPENNLP-755:
--------------------------------------
Summary: Add support to use stop word list in query building
Key: OPENNLP-755
URL: https://issues.apache.org/jira/browse/OPENNLP-755
Project: OpenNLP
Issue Type: Improvement
Components: Entity Linker
Reporter: Joern Kottmann
The geocoder in it's current version might create queries which match on terms
on which the matching shouldn't happen. These terms could be listed in a stop
word list. This stop word list could be used to construct queries which match
only the desired terms.
For example:
<START> New York City <END> is not in Slovenia
This currently matches a hotel called "BTC City".
The index is searched for all terms in the mention. The problem here is if only
"City" matches the response will be kind of bad. Or if only "New" and "City"
matches.
Many place names contain the word "City" and that doesn't help much to
disambiguate the matches.
There should be some special logic dealing with stop words.
The stop words could be removed form the location mention, or better only used
for boosting.
For the case above it could be like this:
- MUST match York
- SHOULD match New OR City
If a name only consists out of stop words e.g. "New City" we could require that
the mention only matches the entire name.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)