[ 
https://issues.apache.org/jira/browse/OPENNLP-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Colen resolved OPENNLP-177.
-----------------------------------

    Resolution: Fixed

Created the CL tool - 

$ bin/opennlp DoccatCrossValidator
Usage: opennlp DoccatCrossValidator[.leipzig] [-reportOutputFile outputFile] 
[-misclassified true|false] [-folds num] [-params paramsFile] -lang language 
-data sampleData [-encoding charsetName]

Arguments description:
        -reportOutputFile outputFile
                the path of the fine-grained report file.
        -misclassified true|false
                if true will print false negatives and false positives.
        -folds num
                number of folds, default is 10.
        -params paramsFile
                training parameters file.
        -lang language
                language which is being processed.
        -data sampleData
                data to be used, usually a file name.
        -encoding charsetName
                encoding for reading and writing text, if absent the system 
default is used.

The reportOutputFile will output detailed F-Measure for each category and the 
confusion matrix.

> Add cross validation support to doccat
> --------------------------------------
>
>                 Key: OPENNLP-177
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-177
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Doccat
>            Reporter: Joern Kottmann
>            Assignee: William Colen
>
> Doccat should support cross validation in order to measure the performance on 
> a data set without test data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to