Jeff Zemerick created OPENNLP-1363:
--------------------------------------

             Summary: Verify the documentation of the lemmatizer input format
                 Key: OPENNLP-1363
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1363
             Project: OpenNLP
          Issue Type: Task
          Components: Documentation
            Reporter: Jeff Zemerick


In OPENNLP-1257, a change was proposed to update the code to split the 
lemmatizer input by spaces instead of by tab. I believe tab is the desired 
delimiter but we need to make sure the documentation is consistent.

Refer to 
[https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer|https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer.]
 , in particular the following sentences:

"The training data consist of three columns separated by spaces. Each word has 
been put on a separate line and there is an empty line after each sentence. The 
first column contains the current word, the second its part-of-speech tag and 
the third its lemma. Here is an example of the file format:"

Determine if that first line should read "separated by tabs" instead.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to