[ 
https://issues.apache.org/jira/browse/OPENNLP-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030394#comment-14030394
 ] 

Joern Kottmann commented on OPENNLP-701:
----------------------------------------

Sounds good. To train an OpenNLP component you have to provide an ObjectStream 
outputting the corresponding XyzSample (e.g. NameSample for the Name Finder) 
object.

The streams which can parse a certain format are in the opennlp.tools.formats 
package. You will see the existing implementations there. I suggest that you 
have a look e.g. at the Conll02NameSampleStream  class. Other implementations 
are usually very similar, so it doesn't really matter a which you look.

To integrate the format into the command line interface you have to  implement 
the Stream Factory, an example is Conll02NameSampleStreamFactory.

Hope that helps!

> Polish language support - Maxent binaries
> -----------------------------------------
>
>                 Key: OPENNLP-701
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-701
>             Project: OpenNLP
>          Issue Type: New Feature
>            Reporter: Chris Krol / IBM
>            Priority: Minor
>
> Hi, 
> Currently I'm working at IBM Poland and my manager approved the idea of 
> contributing various Maxent binaries for Polish language (sentence split, 
> sentence detection, POS tagging and morphological analysis, NER). 
> You could possibly put them on your download page. 
> We trained them using the Golden Standard human-annotated Polish National 
> Corpus (GPL 3.0). 
> Would this be also possible to give some credit (or any) to the fact that the 
> job's been done at IBM?
> I've already sent a mail to the devs,  but haven't seen any response for two 
> weeks now. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to