[jira] [Commented] (OPENNLP-543) Documentation of OpenNLP Traning Format

James Kosin (JIRA) Tue, 30 Oct 2012 18:04:15 -0700

    [ 
https://issues.apache.org/jira/browse/OPENNLP-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487425#comment-13487425
 ]


James Kosin commented on OPENNLP-543:
-------------------------------------

Hmm...

I thought the Corpus Server was going to generate the corpus data in the 
correct formats?  Was I wrong, or is this still a work in progress?

If so, it would be better to push a document on the OpenNLP formats and get the 
Corpus Server to implement.

Marc, the POS and NER models rely on at least the sentence detector and the 
tokenizer to be run first.  So, it is best to get familiar with the first two 
requirements before jumping into the NER or POS models.  You can use the same 
training data for all 4 if you like.

The sentence detector only requires each complete sentence start on a new line. 
 The tokenizer requires a <SPLIT> between the tokens.

ie:  "Wow!" ==>becomes==> " <SPLIT> Wow <SPLIT> ! <SPLIT> " <==in the training 
file.


                
> Documentation of OpenNLP Traning Format
> ---------------------------------------
>
>                 Key: OPENNLP-543
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-543
>             Project: OpenNLP
>          Issue Type: Bug
>            Reporter: Marc Schreiber
>
> Is there any documentation about the training formats which OpenNLP supports?
> I'm working on a project where we need our own models because the project 
> concentrates on specific domains. It would be really great if there is any 
> help for building your own models. 
> If there is no documentation I would offer my help for creating such a 
> documentation but I need someone who helps me with the training formats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-543) Documentation of OpenNLP Traning Format

Reply via email to