[ 
https://issues.apache.org/jira/browse/OPENNLP-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691451#comment-13691451
 ] 

Mark Giaconia commented on OPENNLP-579:
---------------------------------------

Couple thoughts. I completed the changes... but as I implement the 
geoentitylinker I realized it would be useful (perhaps necessary in some cases) 
to have the below overloads in the entitylinker interface.. let me know what 
you think. Descriptions below, sorry for the long post.
  List<T> find(String text, Span sentences[], Span tokens[], Span nameSpans[], 
int sentenceIndex);    //////overloaded with int sentenceIndex
  List<T> find(String text, Span sentences[], String tokens[], Span 
nameSpans[]);   ///////tokens are String[] not Span[]

Descriptions:

  List<T> find(String text, Span sentences[], Span tokens[], Span nameSpans[], 
int sentenceIndex);//overloaded with int sentenceIndex

This method takes a sentenceIndex int  param to the sentences[] so when a user 
generates a String[] of tokens using tokens[] and nameSpans[] (to make String[] 
names for the search), they know which sentence to use. This is useful when 
externally iterating over sentences, getting names, and linking the names. 
Without the int overload, inside the entitylinker find method the user would 
have to hard code an index to the sentences[], or always pass in the one they 
want to use as the first element, or only pass in one element in the 
Sentences[].
here's an example  from my GeoEntityLinker impl

  @Override
  public List<LinkedSpan> find(String text, Span[] sentences, Span[] tokens, 
Span[] names, int sentenceIndex) {
////// //get the sentence from text....using sentenceIndex... getting array of 
sentence strings every call on large documents will be inefficient
      String sentenceINeedTokensFor = Span.spansToStrings(sentences, 
text)[sentenceIndex];
//////////get the string[] tokens I need to get the names
      String[] stringtokens = Span.spansToStrings(tokens, 
sentenceINeedTokensFor );
//////////get the names based on the tokens            
      String[] matches = Span.spansToStrings(names, stringtokens);
      for (int i = 0; i < matches.length; i++) {
///process......
      }


  List<T> find(String text, Span sentences[], String tokens[], Span 
nameSpans[]);

This method allows for a String[] of tokens, rather than Span[] of tokens, 
which eliminates the problem above. The user has what they need to generate 
names using the tokens[] and names[], and they only need to touch the sentences 
and text if desired.
This allows for simpler processing, and is much more efficient because a 
sentence array will not have to be generated for every call in order to get the 
tokens as String[]

  @Override
  public List<LinkedSpan> find(String text, Span[] sentences, String[] tokens, 
Span[] names) {
///////just get the names using tokens[] and nameSpans[]
      String[] matches = Span.spansToStrings(names, tokens);
      for (int i = 0; i < matches.length; i++) {
////process
      }
      return spans;
}

                
> Framework to dynamically link N-best matches from external data to named 
> entities by type (EntityLinker framework)
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-579
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-579
>             Project: OpenNLP
>          Issue Type: Wish
>          Components: Name Finder
>    Affects Versions: 1.6.0
>         Environment: Any
>            Reporter: Mark Giaconia
>            Priority: Minor
>              Labels: features
>             Fix For: 1.6.0
>
>         Attachments: EntityLinker_13Jun2013.zip, EntityLinker_30may2013.zip, 
> entitylinker_8Jun2013.zip, entitylinker_9Jun2013.zip, 
> entitylinkerFramework.zip, geonamefinder.properties, geonamefind.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> A framework for integrating/linking external data to named entities. For 
> instance, geocoding or georeferencing location entities to geonames gazateers 
> can be implemented as an EntityLinker. Initially created ticket to 
> specifically solve the georeferencing problem, but the framework should allow 
> linkage of any external data to any entity type. Commercial applications that 
> do this are expensive, and there are many free gazateers one could use to 
> create solutions with OpenNLP. The capability should provide a default 
> implementation using MySQL or Postgres and the USGS/Geonames Gazateers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to