[ 
https://issues.apache.org/jira/browse/OPENNLP-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Colen updated OPENNLP-743:
----------------------------------
    Fix Version/s: 1.7.1

> The chunker training data format is incorrectly/insufficiently described.
> -------------------------------------------------------------------------
>
>                 Key: OPENNLP-743
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-743
>             Project: OpenNLP
>          Issue Type: Documentation
>          Components: Chunker
>    Affects Versions: 1.7.0
>            Reporter: Zuzana Neverilova
>            Priority: Minor
>              Labels: documentation, easyfix, newbie
>             Fix For: 1.7.1
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The chunker training data format is described as follows: The train data 
> consist of three columns separated by spaces. Each word has been put on a 
> separate line and there is an empty line after each sentence. However, in the 
> example, several spaces are between tokens and tag. First, it looks like tabs 
> (which are not allowed), second several spaces are not allowed as well 
> (apparently, the line String is splitted(" ")). Suggestion: emphasize that 
> columns are separated by one space and tabs are not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to