Trying to articulate what I've done with 
OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579> to get some 
feedback and iterate some more...fail early, fail often I say....

I started to document a bit, here is what I have so far for an explanation...

How to use the OpenNLP EntityLinker framework:

Purpose and Use Cases
The OpenNLP entity Linker framework exists in order to associate extracted 
entities to external data sources. For instance, the EntityLinker framework can 
provide the means to associating a discovered location entity to N records in a 
GeoGazateer. Another case may be to associate a name to a database of person 
names (fuzzily).

Technical Overview.
The framework consists of 3 Main Interfaces and two factories
EntityLinker and EntityLinkerFactory. The factory can return many EntityLinkers 
for a given entitytype using one properties file
Linkable and LinkableFactory. This factory can also return many Linkables for a 
given EntityLinker type from the same properties file.
The current framework assumes all sentence detection, tokenization, and 
namefinding happened externally to the EntityLinker. If the 
LinkedDocumentNameFInder is used, the functionality of NER and EntityLinking is 
encapsulated to a greater degree (see my other post)

Conceptual design
The concept is that an EntityLinker is associated to an entity type. Every 
EntityLinker can utilize Multiple pluggable Linkables. For instance, an 
EntityLinker implementation called GeoEntityLinker can link to several database 
gazateers that are Linkable implementations, such as NGA Geonames and USGS 
placenames, or a SOLR index of locations… the possibilities are endless.
The Factory classes utilize reflection to instantiate EntityLinkers and their 
Linkables from configured properties in a properties file.

Here are the interface signatures:

EntityLinker

public interface EntityLinker<T extends Set<? extends Span>> {
  T find(String[] tokens,Span[] spans, List<Class> linkables) ;//not used 
currently
  T find(String[] tokens,Span[] spans) ;
}

Linkable (an EntityLinker impl utilizes many Linkables)

public interface Linkable<T extends Set<? extends BaseLink>  > extends 
Formattable{
  static LinkableFactory factory = LinkableFactory.getInstance();
  T find(String textToSearchFor);
 T find(String locationText, List<String> whereConditions);
  T getHierarchyFor(BaseLink entry);
}
//formats an entity's text to prepare it to be used as a search string for a 
particular system
public interface Formattable{
  String format(String entity);
}



Mark Giaconia

Reply via email to