Martin Wiesner created OPENNLP-1767: ---------------------------------------
Summary: Fix sentence detection when an abbreviation overlaps at sentence end Key: OPENNLP-1767 URL: https://issues.apache.org/jira/browse/OPENNLP-1767 Project: OpenNLP Issue Type: Bug Components: Sentence Detector Affects Versions: 2.5.5 Reporter: Martin Wiesner Assignee: Martin Wiesner Fix For: 2.5.6, 3.0.0 Atm, sentence detection works incorrectly in case an abbreviation dictionary is loaded which contains common abbreviations, that is, if an abbreviation such as "S." (page in German) overlaps at the sentence end, the actual sentence end is not respected and the subsequent sentence is glued to the previous one. Consequently, the actual sentence boundary is not respected and causes a mismatch. Examples for the German language: - "Die Frage wurde gestellt. Sie wurde beantwortet." - "Es lag am DBMS. Die Performance muss verbessert werden." A reproducer can easily be constructed via a JUnit test for {{SentenceDetectorMEGermanTest}}. Note: Affects all other languages as well. Therefore, the implications are of a higher priority than usual. -- This message was sent by Atlassian Jira (v8.20.10#820010)