Martin Wiesner created OPENNLP-1555: ---------------------------------------
Summary: TokenizerME should detect multi-dot abbreviations Key: OPENNLP-1555 URL: https://issues.apache.org/jira/browse/OPENNLP-1555 Project: OpenNLP Issue Type: Improvement Components: Tokenizer Affects Versions: 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.2.0, 2.1.0 Reporter: Martin Wiesner Assignee: Martin Wiesner Fix For: 2.3.4 TokenizerME should detect and handle multi-dot abbreviations correctly. Currently, this is not handled correctly. For instance, German: "z.B." = "zum Beispiel" (for example) or, Dutch: "e.v." = "en volgende" (and following) are not tokenized correctly and extra tokens are returned. NOTE: no whitespaces in between the dots in the above examples. Aims: * Fix the detection / handling of abbreviations for multi-dot abbreviations * Provide test cases that cover these cases -- This message was sent by Atlassian Jira (v8.20.10#820010)