[
https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691451#comment-13691451
]
Mark Giaconia commented on OPENNLP-579:
---------------------------------------
Couple thoughts. I completed the changes... but as I implement the
geoentitylinker I realized it would be useful (perhaps necessary in some cases)
to have the below overloads in the entitylinker interface.. let me know what
you think. Descriptions below, sorry for the long post.
List<T> find(String text, Span sentences[], Span tokens[], Span nameSpans[],
int sentenceIndex); //////overloaded with int sentenceIndex
List<T> find(String text, Span sentences[], String tokens[], Span
nameSpans[]); ///////tokens are String[] not Span[]
Descriptions:
List<T> find(String text, Span sentences[], Span tokens[], Span nameSpans[],
int sentenceIndex);//overloaded with int sentenceIndex
This method takes a sentenceIndex int param to the sentences[] so when a user
generates a String[] of tokens using tokens[] and nameSpans[] (to make String[]
names for the search), they know which sentence to use. This is useful when
externally iterating over sentences, getting names, and linking the names.
Without the int overload, inside the entitylinker find method the user would
have to hard code an index to the sentences[], or always pass in the one they
want to use as the first element, or only pass in one element in the
Sentences[].
here's an example from my GeoEntityLinker impl
@Override
public List<LinkedSpan> find(String text, Span[] sentences, Span[] tokens,
Span[] names, int sentenceIndex) {
////// //get the sentence from text....using sentenceIndex... getting array of
sentence strings every call on large documents will be inefficient
String sentenceINeedTokensFor = Span.spansToStrings(sentences,
text)[sentenceIndex];
//////////get the string[] tokens I need to get the names
String[] stringtokens = Span.spansToStrings(tokens,
sentenceINeedTokensFor );
//////////get the names based on the tokens
String[] matches = Span.spansToStrings(names, stringtokens);
for (int i = 0; i < matches.length; i++) {
///process......
}
List<T> find(String text, Span sentences[], String tokens[], Span
nameSpans[]);
This method allows for a String[] of tokens, rather than Span[] of tokens,
which eliminates the problem above. The user has what they need to generate
names using the tokens[] and names[], and they only need to touch the sentences
and text if desired.
This allows for simpler processing, and is much more efficient because a
sentence array will not have to be generated for every call in order to get the
tokens as String[]
@Override
public List<LinkedSpan> find(String text, Span[] sentences, String[] tokens,
Span[] names) {
///////just get the names using tokens[] and nameSpans[]
String[] matches = Span.spansToStrings(names, tokens);
for (int i = 0; i < matches.length; i++) {
////process
}
return spans;
}
> Framework to dynamically link N-best matches from external data to named
> entities by type (EntityLinker framework)
> ------------------------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-579
> URL: https://issues.apache.org/jira/browse/OPENNLP-579
> Project: OpenNLP
> Issue Type: Wish
> Components: Name Finder
> Affects Versions: 1.6.0
> Environment: Any
> Reporter: Mark Giaconia
> Priority: Minor
> Labels: features
> Fix For: 1.6.0
>
> Attachments: EntityLinker_13Jun2013.zip, EntityLinker_30may2013.zip,
> entitylinker_8Jun2013.zip, entitylinker_9Jun2013.zip,
> entitylinkerFramework.zip, geonamefinder.properties, geonamefind.zip
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> A framework for integrating/linking external data to named entities. For
> instance, geocoding or georeferencing location entities to geonames gazateers
> can be implemented as an EntityLinker. Initially created ticket to
> specifically solve the georeferencing problem, but the framework should allow
> linkage of any external data to any entity type. Commercial applications that
> do this are expensive, and there are many free gazateers one could use to
> create solutions with OpenNLP. The capability should provide a default
> implementation using MySQL or Postgres and the USGS/Geonames Gazateers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira