[jira] [Commented] (OPENNLP-1190) CONLL02 format

Martin Wiesner (Jira) Fri, 01 Sep 2023 07:55:05 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761330#comment-17761330
 ]


Martin Wiesner commented on OPENNLP-1190:
-----------------------------------------

{{I could successfully run}}
{quote}{{opennlp TokenNameFinderConverter conll02 -data esp.train -lang es 
-types per > es_corpus_train_persons.txt }}
{quote}
{{as mentioned in the latest Documentation and in this Jira issue. There is 
*no* error message as reported in 2018, when this issue was opened. }}

{{{}Reason: The file "esp.train" has 3 columns as expected (taken from: 
[https://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html)]{}}}{{{}{}}}

Conclusion:

The issue is (no longer) valid, as the reported problem(s) could not be 
reproduced.

> CONLL02 format
> --------------
>
>                 Key: OPENNLP-1190
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1190
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Luca
>            Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According to the documentation, the following should work
>  bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types 
> per > es_corpus_train_persons.txt
> However currently it delivers error message since  it expects 3 columns 
> instead of 2 that are in the dataset.
> This is a bug, introduced at line 130 of   
> opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is 
> imposed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1190) CONLL02 format

Reply via email to