Hi folks, We have posted a 1st release candidate for the Apache OpenNLP Pre-Trained Model release version 1.1 and it is ready for testing.
Here are the changes compared to version 1.0: - Trained using Apache OpenNLP 2.4.0. (The models should work with other OpenNLP versions but were trained and tested using this version.) - Trained on the Universal Dependencies corpus, version 2.14 - The French models achieved really low performance scores in version 1.0 - this was due to an error in the FTB treebank, which is now discontinued. We are now using GSD instead. These models achieve a much better performance. - We now provide models for the following new languages: - "Bulgarian|bg|BTB" - "Czech|cs|PDT" - "Croatian|hr|SET" - "Danish|da|DDT" - "Estonian|et|EDT" - "Finnish|fi|TDT" - "Latvian|lv|LVTB" - "Norwegian|no|Bokmaal" - "Polish|pl|PDB" - "Portuguese|pt|GSD" - "Romanian|ro|RRT" - "Russian|ru|GSD" - "Serbian|sr|SET" - "Slovenian|sl|SSJ" - "Slovak|sk|SNK" - "Spanish|es|GSD" - "Swedish|sv|Talbanken" - "Ukrainian|uk|IU" Thank you to everyone who contributed to this model release. Models: https://dist.apache.org/repos/dist/dev/opennlp/models/ud-models-1.1-rc1/ The results of the eval tests for each model are contained in the "opennlp-training-eval-logs-1.1-2.4.0.zip" on dist/dev. Reminder: The up-2-date KEYS file for signature verification can be found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS Please vote on releasing these packages as Apache OpenNLP Pre-Trained Models v1.1. The vote is open for at least the next 72 hours. Only votes from OpenNLP PMC are binding, but everyone is welcome to check the release candidate and vote. The vote passes if at least three binding +1 votes are cast. Please VOTE [+1] go ship it [+0] meh, don't care [-1] stop, there is a ${showstopper} Note: I prepared the models in a video session with Martin W. :) Thanks! Martin & Richard