Martin Wiesner created OPENNLP-1767:
---------------------------------------

             Summary: Fix sentence detection when an abbreviation overlaps at 
sentence end
                 Key: OPENNLP-1767
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1767
             Project: OpenNLP
          Issue Type: Bug
          Components: Sentence Detector
    Affects Versions: 2.5.5
            Reporter: Martin Wiesner
            Assignee: Martin Wiesner
             Fix For: 2.5.6, 3.0.0


Atm, sentence detection works incorrectly in case an abbreviation dictionary is 
loaded which contains common abbreviations, that is, if an abbreviation such as 
"S." (page in German) overlaps at the sentence end, the actual sentence end is 
not respected and the subsequent sentence is glued to the previous one. 
Consequently, the actual sentence boundary is not respected and causes a 
mismatch.

Examples for the German language:
- "Die Frage wurde gestellt. Sie wurde beantwortet."
- "Es lag am DBMS. Die Performance muss verbessert werden."

A reproducer can easily be constructed via a JUnit test for 
{{SentenceDetectorMEGermanTest}}.

Note:
Affects all other languages as well. Therefore, the implications are of a 
higher priority than usual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to