[
https://issues.apache.org/jira/browse/OPENNLP-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794612#comment-17794612
]
ASF GitHub Bot commented on OPENNLP-1479:
-----------------------------------------
mawiesne commented on code in PR #559:
URL: https://github.com/apache/opennlp/pull/559#discussion_r1420180322
##########
opennlp-tools/src/test/java/opennlp/tools/tokenize/TokenizerMEIT.java:
##########
@@ -36,4 +36,16 @@ void testTokenizerDownloadedModel() throws IOException {
Assertions.assertEquals(",", tokens[1]);
}
+ @Test
+ void testTokenizerDownloadedModelDe() throws IOException {
Review Comment:
@l-ma As far as I can see, this (integration) test isn't required, as it
merely replicates what is already covered by your enhanced unit test.
Therefore, just revert `TokenizerMEIT.java` back to its original form.
> Write better tests for pattern verification (tokenizers)
> --------------------------------------------------------
>
> Key: OPENNLP-1479
> URL: https://issues.apache.org/jira/browse/OPENNLP-1479
> Project: OpenNLP
> Issue Type: Improvement
> Components: Tokenizer
> Affects Versions: 2.1.1
> Reporter: Bruno P. Kinoshita
> Assignee: Lara Marinov
> Priority: Major
> Fix For: 2.3.2
>
>
> From [https://github.com/apache/opennlp/pull/516#issuecomment-1455015772]
> At the moment our tests verify that the tokenizer objects are created
> correctly (i.e. tests getters and setters, constructor, etc.), without
> verifying the actual behavior when used in conjunction with other classes
> (factory, tokenizer, trainers, etc).
> It would be best to test the patterns used in the factories for different
> languages with some interesting sample data (maybe something from project
> gutenberg, open source news sites, etc.).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)