[
https://issues.apache.org/jira/browse/OPENNLP-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner resolved OPENNLP-1781.
-------------------------------------
Resolution: Fixed
> SentenceDetectorME throws StringIndexOutOfBoundsException when sentence
> starts with an abbreviation
> ---------------------------------------------------------------------------------------------------
>
> Key: OPENNLP-1781
> URL: https://issues.apache.org/jira/browse/OPENNLP-1781
> Project: OpenNLP
> Issue Type: Bug
> Affects Versions: 2.5.6
> Reporter: Richard Zowalla
> Assignee: Richard Zowalla
> Priority: Major
> Fix For: 2.5.7, 3.0.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> When an abbreviation appears at the beginning of a sentence, OpenNLP 2.5.6's
> SentenceDetectorME can throw a java.lang.StringIndexOutOfBoundsException.
> This issue can be reproduced with a test like the following:
> {code:java}
> @Test
> void testSentDetectWithAbbreviationsAtSentenceStart() {
> prepareResources(true);
> final String sent1 = "S. Träume sind eine Verbindung von Gedanken.";
> //There is no blank space before start of the second sentence.
> String[] sents = sentenceDetector.sentDetect(sent1);
> double[] probs = sentenceDetector.probs();
> assertAll(
> () -> assertEquals(1, sents.length),
> () -> assertEquals(sent1, sents[0]),
> () -> assertEquals(1, probs.length));
> }{code}
> A practical scenario where an abbreviation might appear at the start of a
> sentence is when using an ICD-10 code in a medical context.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)