Hello,

as far as I understand the current proposal of the EntityLinker it already operates on a per document
level. It gets an entire document passed to the find method.

In OpenNLP the user itself is responsible to do all the pre-processing, because this also often varies in small details, e.g. some users need access to the token and sentence segmentation, some only to the tokenization,
some don't need anything, some have already the sentences, etc.
The code to write this logic is usually very simple and lets the user integrate things as they fit into their application.

Jörn

On 06/02/2013 10:40 PM, Giaconia, Mark [USA] wrote:
As part of working the EntityLinker (issue 
OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579>), I created a 
new Interface and a default impl
called LinkableDocumentNameFinder/DefaultLinkableDocumentNameFinderImpl.
Here are the method signatures for the Interface

public interface LinkableDocumentNameFinder{
   Document find(String[] sentences, Tokenizer tokenizer, List<TokenNameFinder> 
nameFinders, boolean linkable);
   Document find(String documentText, SentenceDetector sentenceDetector, Tokenizer 
tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
   Document find(List<Sentence> sentences, Tokenizer tokenizer, 
List<TokenNameFinder> nameFinders, boolean linkable);
   Document find(Document document, SentenceDetector sentenceDetector, Tokenizer 
tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
   List<Document> find(List<Document> documents, SentenceDetector sentenceDetector, 
Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
}

notice the Document object return type... here is what a Document object looks 
like

public class Document{
  private List<Sentence> sentences = new ArrayList<>();
   public List<Sentence> getSentences()  {
     return sentences;
   }
   public void setSentences(List<Sentence> sentences)  {
     this.sentences = sentences;
   }
}

notice the Sentence object..... here it is:
public class Sentence{
   private String sentenceText;
   private Integer sentenceNumber;
   private List<String> tokens = new ArrayList<>();
   private List<Span> spans = new ArrayList<>();

   public Sentence(String sentenceText, Integer sentenceNumber)  {
     this.sentenceNumber = sentenceNumber;
     this.sentenceText = sentenceText;
   }
//setters...getters....
}


Mark Giaconia



Reply via email to