Hi folks, I've posted a 1st release candidate for the Apache OpenNLP Pre-Trained Model release version 1.2 and it is ready for testing.
Here are the changes compared to version 1.1: - All models have been trained on the Universal Dependencies corpus version 2.15 using Apache OpenNLP 2.5.0 (and should work with previous OpenNLP versions). - Training was conducted with 300 iterations instead of 100 which should result in better overall model performance. - Please note: - The Lemmatizer model type was added for all existing and new languages. - We now provide models of all four types for the following 9 new languages: - “Armenian|hy|BSUT” - “Basque|eu|BDT” - “Catalan|ca|AnCora” - “Georgian|ka|GLC” - “Greek|el|GDT” - “Kazakh|kk|KTB” - “Korean|ko|Kaist” - “Icelandic|is|IcePaHC” - “Turkish|tr|BOUN” - Refer to the opennlp-training-eval-logs-1.2-2.5.0.zip file for the individual model training and evaluation logs. In total, OpenNLP now provides pre-trained models for 32 languages. Models: https://dist.apache.org/repos/dist/dev/opennlp/models/ud-models-1.2-rc1/ The results of the eval tests for each model are contained in the "opennlp-training-eval-logs-1.2-2.5.0.zip" on dist/dev. Reminder: The up-2-date KEYS file for signature verification can be found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS Please vote on releasing these packages as Apache OpenNLP Pre-Trained Models v1.2. The vote is open for at least the next 72 hours. Only votes from OpenNLP PMC are binding, but everyone is welcome to check the release candidate and vote. The vote passes if at least three binding +1 votes are cast. Please VOTE [+1] go ship it [+0] meh, don't care [-1] stop, there is a ${showstopper} Thanks! Martin