[ 
https://issues.apache.org/jira/browse/OPENNLP-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258799#comment-15258799
 ] 

Joern Kottmann edited comment on OPENNLP-837 at 4/26/16 7:55 PM:
-----------------------------------------------------------------

Just reviewed this as well now and think the solution is good. As far as I see 
this work is not specific to the doccat component. I suggest we update this 
jira slightly to reflect that.

The CLI should display appropriate warning messages depending on the quantity 
of the training data. We could develop a simple grading system which prints out 
advice based on the amount provided by the user, e.g. a name finder model might 
be trained with 250 sentences, but the performance will be poor in most cases.  



was (Author: joern):
Just reviewed this as well now and think the solution is good. As far as I see 
this work is not specific to the doccat component. I suggest we update this 
jira slightly to reflect that.

> Let Doccat fail when non-sufficient amounts of training data are provided for 
> training
> --------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-837
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-837
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Doccat
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.6.1
>
>         Attachments: OPENNLP-837.patch
>
>
> When the amounts of training data are not sufficient in order to train a 
> Doccat model the user should be made aware of that with an informative 
> message, e.g. a warning when using the command line, an exception when 
> calling the APIs programmatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to