[ 
https://issues.apache.org/jira/browse/OPENNLP-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Giaconia updated OPENNLP-756:
----------------------------------
    Attachment: newCountryContextFile.txt

I attached the file I used for testing regex support, which has a regex for 
United States, UAE and Netherlands|holland. The only problem will be if someone 
inserts a \t char into their regex.. which will blowup the tab delimitted 
format of the file. 


> GeoEntityLinker Admin Boundary context generator should allow regex for more 
> flexibility and better discovery of location context
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-756
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-756
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Entity Linker
>    Affects Versions: addons-1.6.0
>         Environment: java 7
>            Reporter: Mark Giaconia
>            Assignee: Mark Giaconia
>             Fix For: addons-1.6.0
>
>         Attachments: newCountryContextFile.txt
>
>
> Currently the way the AdminBoundaryContextGenerator discovers Country, 
> Province, and County mentions is inflexible and misses a lot of mentions. The 
> GeoEntityLinker should support regexes in the countrycontext file so that it 
> will find more mentions based on user defined extensions via regex. This 
> change propagates to several other classes called within the GeoEntityLinker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to