[jira] [Commented] (OPENNLP-743) The chunker training data format is incorrectly/insufficiently described.

ASF GitHub Bot (JIRA) Mon, 02 Jan 2017 12:47:15 -0800

    [ 
https://issues.apache.org/jira/browse/OPENNLP-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793470#comment-15793470
 ]


ASF GitHub Bot commented on OPENNLP-743:
----------------------------------------

GitHub user wcolen opened a pull request:

    https://github.com/apache/opennlp/pull/25

    Updates docs to make it clear the Chunker training format

    The documentation was not clear, leading the user to think that any number 
of spaces could be used as column separator. This updates the documentation 
stating that only single space is acceptable.
    
    See issue OPENNLP-743

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wcolen/opennlp 743

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/opennlp/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #25
    
----
commit 3c998b6a0ede39aa4c15e7a2ccdc37d016f9202e
Author: William D C M SILVA <[email protected]>
Date:   2017-01-02T20:41:20Z

    Makes clear the Chunker training format
    
    See issue OPENNLP-743

----


> The chunker training data format is incorrectly/insufficiently described.
> -------------------------------------------------------------------------
>
>                 Key: OPENNLP-743
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-743
>             Project: OpenNLP
>          Issue Type: Documentation
>          Components: Chunker
>    Affects Versions: 1.7.0
>            Reporter: Zuzana Neverilova
>            Assignee: William Colen
>            Priority: Minor
>              Labels: documentation, easyfix, newbie
>             Fix For: 1.7.1
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The chunker training data format is described as follows: The train data 
> consist of three columns separated by spaces. Each word has been put on a 
> separate line and there is an empty line after each sentence. However, in the 
> example, several spaces are between tokens and tag. First, it looks like tabs 
> (which are not allowed), second several spaces are not allowed as well 
> (apparently, the line String is splitted(" ")). Suggestion: emphasize that 
> columns are separated by one space and tabs are not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OPENNLP-743) The chunker training data format is incorrectly/insufficiently described.

Reply via email to