Hi,

Any objections against (or support for) doing a release (1.2) for the 
pre-trained models? Details below:

I’ve added 9 new languages (via UD treebanks) to our model list:

- "Armenian|hy|BSUT"
- "Basque|eu|BDT"
- "Catalan|ca|AnCora"
- "Georgian|ka|GLC"
- "Greek|el|GDT"
- "Kazakh|kk|KTB"
- "Korean|ko|Kaist"
- "Icelandic|is|IcePaHC" 
- "Turkish|tr|BOUN"

In total, this results in 32 supported languages, each with sentence detection, 
tokenization and POS tagging.

Moreover, the updated ud-train script now produces ME models for the Lemmatizer 
component, 32 of them.

The training is conducted with OpenNLP 2.5.0 and with the treebanks from the 
latest UD release, dating Nov 15, 2024.

I can act as RM.

Best
Martin

Reply via email to