[
https://issues.apache.org/jira/browse/OPENNLP-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
William Colen updated OPENNLP-743:
----------------------------------
Fix Version/s: 1.7.1
> The chunker training data format is incorrectly/insufficiently described.
> -------------------------------------------------------------------------
>
> Key: OPENNLP-743
> URL: https://issues.apache.org/jira/browse/OPENNLP-743
> Project: OpenNLP
> Issue Type: Documentation
> Components: Chunker
> Affects Versions: 1.7.0
> Reporter: Zuzana Neverilova
> Priority: Minor
> Labels: documentation, easyfix, newbie
> Fix For: 1.7.1
>
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> The chunker training data format is described as follows: The train data
> consist of three columns separated by spaces. Each word has been put on a
> separate line and there is an empty line after each sentence. However, in the
> example, several spaces are between tokens and tag. First, it looks like tabs
> (which are not allowed), second several spaces are not allowed as well
> (apparently, the line String is splitted(" ")). Suggestion: emphasize that
> columns are separated by one space and tabs are not allowed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)