[ 
https://issues.apache.org/jira/browse/OPENNLP-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner closed OPENNLP-1615.
-----------------------------------

> Provide more languages for pre-trained UD-based OpenNLP models 
> ---------------------------------------------------------------
>
>                 Key: OPENNLP-1615
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1615
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Models
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.4.1
>
>
> As [https://universaldependencies.org|https://universaldependencies.org/] 
> offers treebanks for many languages, we should add further basic, pre-trained 
> models (Sentence detection, Tokenizer, POS tagging).
> A first investigation has shown promising results for the following languages:
> * “Bulgarian|bg|BTB”
> * “Czech|cs|PDT”
> * “Croatian|hr|SET”
> * “Danish|da|DDT”
> * “Estonian|et|EDT”
> * “Finnish|fi|TDT”
> * “Latvian|lv|LVTB”
> * “Norwegian|no|Bokmaal”
> * “Polish|pl|PDB”
> * “Portuguese|pt|GSD”
> * “Romanian|ro|RRT”
> * “Russian|ru|GSD”
> * “Serbian|sr|SET”
> * “Slovak|sk|SNK”
> * “Slovenian|sl|SSJ”
> * “Spanish|es|GSD”
> * “Swedish|sv|Talbanken”
> * “Ukrainian|uk|IU”
> The training succeeded and the eval results revealed a solid to excellent 
> performance.
> Previously available languages, that is EN, FR, DE, NL, IT, should also be 
> retrained.
> Aims: 
> * (Re-)Train the three models per language listed above with UD release 2.14
> * Package and release as JAR files via Maven Central
> * Optional (?): Release the model files via the classic channel (website)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to