[ 
https://issues.apache.org/jira/browse/OPENNLP-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793549#comment-17793549
 ] 

ASF GitHub Bot commented on OPENNLP-1479:
-----------------------------------------

mawiesne commented on PR #559:
URL: https://github.com/apache/opennlp/pull/559#issuecomment-1842388575

   Thx @l-ma, that's a first contribution of high value. I'm happy you found 
the Sigmund Freud text sample, I added just recently. I will provide feedback 
on the German part as soon as I find some spare minutes.
   
   Meanwhile, feel free to add further test cases for other languages you are 
familiar with, that is, other than English). French could be interesting.
   
   @kinow might potentially provide feedback or examples for PT, ES and other 
languages from that family / group. He was involved into the topic some months 
back and opened the related Jira. 
   
   Just stack further commits on top of the existing test case, or squash 
locally and force push into this branch here. 




> Write better tests for pattern verification (tokenizers)
> --------------------------------------------------------
>
>                 Key: OPENNLP-1479
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1479
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Tokenizer
>    Affects Versions: 2.1.1
>            Reporter: Bruno P. Kinoshita
>            Priority: Major
>             Fix For: 2.3.2
>
>
> From [https://github.com/apache/opennlp/pull/516#issuecomment-1455015772]
> At the moment our tests verify that the tokenizer objects are created 
> correctly (i.e. tests getters and setters, constructor, etc.), without 
> verifying the actual behavior when used in conjunction with other classes 
> (factory, tokenizer, trainers, etc).
> It would be best to test the patterns used in the factories for different 
> languages with some interesting sample data (maybe something from project 
> gutenberg, open source news sites, etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to