Currently this regex finding of countrycontext is done in a CountryContext
class which is behind the GeoEntityLinker impl itself. This class's
regexFind method takes the full doc text as a param and returns a hashmap
of each country code to a set of mentions in the doc :
 public Map<String, Set<Integer>> regexfind(String docText,
EntityLinkerProperties properties)
this could be done in as a NameFinder impl extension, but since it was
specific to the GeoEntityLinker impl I didn't bother, but initially I did
think of this



On Tue, Oct 22, 2013 at 2:45 PM, Jörn Kottmann <[email protected]> wrote:

> On 10/05/2013 11:58 PM, Mark G wrote:
>
>> 2. Discovery of indicators for "country context" should be regex based, in
>> order to provide a more robust ability to discover context
>>
>> Currenty I use a String.indexOf(term) to discover the country hit list.
>> Regex would allow users to configure interesting ways to indicate
>> countries. Regex will also provide the array of start/end I need for issue
>> 1 from its Matcher.find
>>
>
> Can we reuse the name finder for this? The user could simply provide a
> name finder which
> can do this depending on what is possible for him, e.g. trained on his
> data, regex based,
> dictionary based, etc.
>
> Jörn
>

Reply via email to