[
https://issues.apache.org/jira/browse/OPENNLP-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319226#comment-15319226
]
Mark Giaconia commented on OPENNLP-756:
---------------------------------------
Finally got around to implementing this in the last few days, sorry it's been
so long. The approach I am taking is that the country, province, and county
(where county data exists) will be regexable, and the regexes will be used in
the AdminBoundaryContextGenerator to discover mentions in text. The Scorers,
countryproximityscorer and provinceproximity scorer now both use those regexes
as well, and this "should" (the evil S word) make precision and recall better.
I am testing it now, will commit as soon as the kinks are worked out.
> GeoEntityLinker Admin Boundary context generator should allow regex for more
> flexibility and better discovery of location context
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-756
> URL: https://issues.apache.org/jira/browse/OPENNLP-756
> Project: OpenNLP
> Issue Type: Improvement
> Components: Entity Linker
> Affects Versions: addons-1.6.0
> Environment: java 7
> Reporter: Mark Giaconia
> Assignee: Mark Giaconia
> Fix For: addons-1.6.0
>
> Attachments: newCountryContextFile.txt
>
>
> Currently the way the AdminBoundaryContextGenerator discovers Country,
> Province, and County mentions is inflexible and misses a lot of mentions. The
> GeoEntityLinker should support regexes in the countrycontext file so that it
> will find more mentions based on user defined extensions via regex. This
> change propagates to several other classes called within the GeoEntityLinker
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)