[ 
https://issues.apache.org/jira/browse/OPENNLP-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner resolved OPENNLP-1781.
-------------------------------------
    Resolution: Fixed

> SentenceDetectorME throws StringIndexOutOfBoundsException when sentence 
> starts with an abbreviation
> ---------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-1781
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1781
>             Project: OpenNLP
>          Issue Type: Bug
>    Affects Versions: 2.5.6
>            Reporter: Richard Zowalla
>            Assignee: Richard Zowalla
>            Priority: Major
>             Fix For: 2.5.7, 3.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When an abbreviation appears at the beginning of a sentence, OpenNLP 2.5.6's 
> SentenceDetectorME can throw a java.lang.StringIndexOutOfBoundsException.
> This issue can be reproduced with a test like the following:
> {code:java}
> @Test
> void testSentDetectWithAbbreviationsAtSentenceStart() {
>   prepareResources(true);
>   final String sent1 = "S. Träume sind eine Verbindung von Gedanken.";
>   //There is no blank space before start of the second sentence.
>   String[] sents = sentenceDetector.sentDetect(sent1);
>   double[] probs = sentenceDetector.probs();
>   assertAll(
>       () -> assertEquals(1, sents.length),
>       () -> assertEquals(sent1, sents[0]),
>       () -> assertEquals(1, probs.length));
> }{code}
> A practical scenario where an abbreviation might appear at the start of a 
> sentence is when using an ICD-10 code in a medical context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to