Hi folks,

I've posted a 1st release candidate for the Apache OpenNLP Pre-Trained Model 
release version 1.2 and it is ready for testing.

Here are the changes compared to version 1.1:

- All models have been trained on the Universal Dependencies corpus version 
2.15 using Apache OpenNLP 2.5.0 (and should work with previous OpenNLP 
versions).
- Training was conducted with 300 iterations instead of 100 which should result 
in better overall model performance.
- Please note:
        - The Lemmatizer model type was added for all existing and new 
languages.
        - We now provide models of all four types for the following 9 new 
languages:
           -  “Armenian|hy|BSUT”
           -  “Basque|eu|BDT”
           -  “Catalan|ca|AnCora”
           -  “Georgian|ka|GLC”
           -  “Greek|el|GDT”
           -  “Kazakh|kk|KTB”
           -  “Korean|ko|Kaist”
           -  “Icelandic|is|IcePaHC”
           -  “Turkish|tr|BOUN”
- Refer to the opennlp-training-eval-logs-1.2-2.5.0.zip file for the individual 
model training and evaluation logs.
   
In total, OpenNLP now provides pre-trained models for 32 languages.

Models:

https://dist.apache.org/repos/dist/dev/opennlp/models/ud-models-1.2-rc1/

The results of the eval tests for each model are contained in the 
"opennlp-training-eval-logs-1.2-2.5.0.zip" on dist/dev.

Reminder: The up-2-date KEYS file for signature verification can be
found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS

Please vote on releasing these packages as Apache OpenNLP Pre-Trained Models 
v1.2.
The vote is open for at least the next 72 hours.

Only votes from OpenNLP PMC are binding, but everyone is welcome to check the 
release candidate and vote.
The vote passes if at least three binding +1 votes are cast.

Please VOTE

[+1] go ship it
[+0] meh, don't care
[-1] stop, there is a ${showstopper}

Thanks!
Martin

Reply via email to