[
https://issues.apache.org/jira/browse/OPENNLP-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Giaconia updated OPENNLP-756:
----------------------------------
Attachment: newCountryContextFile.txt
I attached the file I used for testing regex support, which has a regex for
United States, UAE and Netherlands|holland. The only problem will be if someone
inserts a \t char into their regex.. which will blowup the tab delimitted
format of the file.
> GeoEntityLinker Admin Boundary context generator should allow regex for more
> flexibility and better discovery of location context
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-756
> URL: https://issues.apache.org/jira/browse/OPENNLP-756
> Project: OpenNLP
> Issue Type: Improvement
> Components: Entity Linker
> Affects Versions: addons-1.6.0
> Environment: java 7
> Reporter: Mark Giaconia
> Assignee: Mark Giaconia
> Fix For: addons-1.6.0
>
> Attachments: newCountryContextFile.txt
>
>
> Currently the way the AdminBoundaryContextGenerator discovers Country,
> Province, and County mentions is inflexible and misses a lot of mentions. The
> GeoEntityLinker should support regexes in the countrycontext file so that it
> will find more mentions based on user defined extensions via regex. This
> change propagates to several other classes called within the GeoEntityLinker
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)