[
https://issues.apache.org/jira/browse/OPENNLP-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zemerick closed OPENNLP-1363.
----------------------------------
> Verify the documentation of the lemmatizer input format
> -------------------------------------------------------
>
> Key: OPENNLP-1363
> URL: https://issues.apache.org/jira/browse/OPENNLP-1363
> Project: OpenNLP
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 2.1.0
> Reporter: Jeff Zemerick
> Assignee: Atita Arora
> Priority: Minor
> Fix For: 2.1.1
>
>
> In OPENNLP-1257, a change was proposed to update the code to split the
> lemmatizer input by spaces instead of by tab. I believe tab is the desired
> delimiter but we need to make sure the documentation is consistent.
> Refer to
> [https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer|https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer.]
> , in particular the following sentences:
> "The training data consist of three columns separated by spaces. Each word
> has been put on a separate line and there is an empty line after each
> sentence. The first column contains the current word, the second its
> part-of-speech tag and the third its lemma. Here is an example of the file
> format:"
> Determine if that first line should read "separated by tabs" instead.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)