[
https://issues.apache.org/jira/browse/OPENNLP-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258799#comment-15258799
]
Joern Kottmann edited comment on OPENNLP-837 at 4/26/16 7:55 PM:
-----------------------------------------------------------------
Just reviewed this as well now and think the solution is good. As far as I see
this work is not specific to the doccat component. I suggest we update this
jira slightly to reflect that.
The CLI should display appropriate warning messages depending on the quantity
of the training data. We could develop a simple grading system which prints out
advice based on the amount provided by the user, e.g. a name finder model might
be trained with 250 sentences, but the performance will be poor in most cases.
was (Author: joern):
Just reviewed this as well now and think the solution is good. As far as I see
this work is not specific to the doccat component. I suggest we update this
jira slightly to reflect that.
> Let Doccat fail when non-sufficient amounts of training data are provided for
> training
> --------------------------------------------------------------------------------------
>
> Key: OPENNLP-837
> URL: https://issues.apache.org/jira/browse/OPENNLP-837
> Project: OpenNLP
> Issue Type: Bug
> Components: Doccat
> Reporter: Tommaso Teofili
> Assignee: Tommaso Teofili
> Fix For: 1.6.1
>
> Attachments: OPENNLP-837.patch
>
>
> When the amounts of training data are not sufficient in order to train a
> Doccat model the user should be made aware of that with an informative
> message, e.g. a warning when using the command line, an exception when
> calling the APIs programmatically.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)