Hello,
as far as I understand the current proposal of the EntityLinker it
already operates on a per document
level. It gets an entire document passed to the find method.
In OpenNLP the user itself is responsible to do all the pre-processing,
because this also often varies in small details,
e.g. some users need access to the token and sentence segmentation, some
only to the tokenization,
some don't need anything, some have already the sentences, etc.
The code to write this logic is usually very simple and lets the user
integrate things as they fit into their application.
Jörn
On 06/02/2013 10:40 PM, Giaconia, Mark [USA] wrote:
As part of working the EntityLinker (issue
OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579>), I created a
new Interface and a default impl
called LinkableDocumentNameFinder/DefaultLinkableDocumentNameFinderImpl.
Here are the method signatures for the Interface
public interface LinkableDocumentNameFinder{
Document find(String[] sentences, Tokenizer tokenizer, List<TokenNameFinder>
nameFinders, boolean linkable);
Document find(String documentText, SentenceDetector sentenceDetector, Tokenizer
tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
Document find(List<Sentence> sentences, Tokenizer tokenizer,
List<TokenNameFinder> nameFinders, boolean linkable);
Document find(Document document, SentenceDetector sentenceDetector, Tokenizer
tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
List<Document> find(List<Document> documents, SentenceDetector sentenceDetector,
Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
}
notice the Document object return type... here is what a Document object looks
like
public class Document{
private List<Sentence> sentences = new ArrayList<>();
public List<Sentence> getSentences() {
return sentences;
}
public void setSentences(List<Sentence> sentences) {
this.sentences = sentences;
}
}
notice the Sentence object..... here it is:
public class Sentence{
private String sentenceText;
private Integer sentenceNumber;
private List<String> tokens = new ArrayList<>();
private List<Span> spans = new ArrayList<>();
public Sentence(String sentenceText, Integer sentenceNumber) {
this.sentenceNumber = sentenceNumber;
this.sentenceText = sentenceText;
}
//setters...getters....
}
Mark Giaconia