Trying to articulate what I've done with
OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579> to get some
feedback and iterate some more...fail early, fail often I say....
I started to document a bit, here is what I have so far for an explanation...
How to use the OpenNLP EntityLinker framework:
Purpose and Use Cases
The OpenNLP entity Linker framework exists in order to associate extracted
entities to external data sources. For instance, the EntityLinker framework can
provide the means to associating a discovered location entity to N records in a
GeoGazateer. Another case may be to associate a name to a database of person
names (fuzzily).
Technical Overview.
The framework consists of 3 Main Interfaces and two factories
EntityLinker and EntityLinkerFactory. The factory can return many EntityLinkers
for a given entitytype using one properties file
Linkable and LinkableFactory. This factory can also return many Linkables for a
given EntityLinker type from the same properties file.
The current framework assumes all sentence detection, tokenization, and
namefinding happened externally to the EntityLinker. If the
LinkedDocumentNameFInder is used, the functionality of NER and EntityLinking is
encapsulated to a greater degree (see my other post)
Conceptual design
The concept is that an EntityLinker is associated to an entity type. Every
EntityLinker can utilize Multiple pluggable Linkables. For instance, an
EntityLinker implementation called GeoEntityLinker can link to several database
gazateers that are Linkable implementations, such as NGA Geonames and USGS
placenames, or a SOLR index of locations… the possibilities are endless.
The Factory classes utilize reflection to instantiate EntityLinkers and their
Linkables from configured properties in a properties file.
Here are the interface signatures:
EntityLinker
public interface EntityLinker<T extends Set<? extends Span>> {
T find(String[] tokens,Span[] spans, List<Class> linkables) ;//not used
currently
T find(String[] tokens,Span[] spans) ;
}
Linkable (an EntityLinker impl utilizes many Linkables)
public interface Linkable<T extends Set<? extends BaseLink> > extends
Formattable{
static LinkableFactory factory = LinkableFactory.getInstance();
T find(String textToSearchFor);
T find(String locationText, List<String> whereConditions);
T getHierarchyFor(BaseLink entry);
}
//formats an entity's text to prepare it to be used as a search string for a
particular system
public interface Formattable{
String format(String entity);
}
Mark Giaconia