Currently this regex finding of countrycontext is done in a CountryContext class which is behind the GeoEntityLinker impl itself. This class's regexFind method takes the full doc text as a param and returns a hashmap of each country code to a set of mentions in the doc : public Map<String, Set<Integer>> regexfind(String docText, EntityLinkerProperties properties) this could be done in as a NameFinder impl extension, but since it was specific to the GeoEntityLinker impl I didn't bother, but initially I did think of this
On Tue, Oct 22, 2013 at 2:45 PM, Jörn Kottmann <[email protected]> wrote: > On 10/05/2013 11:58 PM, Mark G wrote: > >> 2. Discovery of indicators for "country context" should be regex based, in >> order to provide a more robust ability to discover context >> >> Currenty I use a String.indexOf(term) to discover the country hit list. >> Regex would allow users to configure interesting ways to indicate >> countries. Regex will also provide the array of start/end I need for issue >> 1 from its Matcher.find >> > > Can we reuse the name finder for this? The user could simply provide a > name finder which > can do this depending on what is possible for him, e.g. trained on his > data, regex based, > dictionary based, etc. > > Jörn >
