Hi, Any objections against (or support for) doing a release (1.2) for the pre-trained models? Details below:
I’ve added 9 new languages (via UD treebanks) to our model list: - "Armenian|hy|BSUT" - "Basque|eu|BDT" - "Catalan|ca|AnCora" - "Georgian|ka|GLC" - "Greek|el|GDT" - "Kazakh|kk|KTB" - "Korean|ko|Kaist" - "Icelandic|is|IcePaHC" - "Turkish|tr|BOUN" In total, this results in 32 supported languages, each with sentence detection, tokenization and POS tagging. Moreover, the updated ud-train script now produces ME models for the Lemmatizer component, 32 of them. The training is conducted with OpenNLP 2.5.0 and with the treebanks from the latest UD release, dating Nov 15, 2024. I can act as RM. Best Martin
