[
https://issues.apache.org/jira/browse/OPENNLP-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793470#comment-15793470
]
ASF GitHub Bot commented on OPENNLP-743:
----------------------------------------
GitHub user wcolen opened a pull request:
https://github.com/apache/opennlp/pull/25
Updates docs to make it clear the Chunker training format
The documentation was not clear, leading the user to think that any number
of spaces could be used as column separator. This updates the documentation
stating that only single space is acceptable.
See issue OPENNLP-743
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wcolen/opennlp 743
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/opennlp/pull/25.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #25
----
commit 3c998b6a0ede39aa4c15e7a2ccdc37d016f9202e
Author: William D C M SILVA <[email protected]>
Date: 2017-01-02T20:41:20Z
Makes clear the Chunker training format
See issue OPENNLP-743
----
> The chunker training data format is incorrectly/insufficiently described.
> -------------------------------------------------------------------------
>
> Key: OPENNLP-743
> URL: https://issues.apache.org/jira/browse/OPENNLP-743
> Project: OpenNLP
> Issue Type: Documentation
> Components: Chunker
> Affects Versions: 1.7.0
> Reporter: Zuzana Neverilova
> Assignee: William Colen
> Priority: Minor
> Labels: documentation, easyfix, newbie
> Fix For: 1.7.1
>
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> The chunker training data format is described as follows: The train data
> consist of three columns separated by spaces. Each word has been put on a
> separate line and there is an empty line after each sentence. However, in the
> example, several spaces are between tokens and tag. First, it looks like tabs
> (which are not allowed), second several spaces are not allowed as well
> (apparently, the line String is splitted(" ")). Suggestion: emphasize that
> columns are separated by one space and tabs are not allowed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)