[Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

Виктор Булатов Thu, 16 Dec 2021 14:20:36 -0800

Hi everyone. The Interslavic language is a constructed language that is
created in such a way that people from Slavic countries are able to
understand most of it without any prior education. It has a Wikipedia page
and everything (maybe we even will have an ISO-639-3 code "ISV" in the
future, fingers crossed!).


I'm looking into developing some sort of MT system for Interslavic (mainly
the "Some Natural Slavic Language -> Interslavic" direction). I've managed
to cobble a prototype with Russian UDPipe and ISV morphological data/rules
before finding out about Apertium (and you guys seem interesting).

The thing is, Russian and Czech are probably the richest Slavic languages
in terms of NLP resources. Apertium obviously isn't going to beat a
dependency parser that was trained on >1M of labeled sentences. So, I don't
really need any of the earlier stages of the Apertium pipeline. However,
the chunking and multi-word-expression modules seem promising, especially
given that I probably could re-use already existing rules (that are
written for different Slavic languages, but it doesn't matter).

So, my question is: is it possible to use the chunking module in isolation?
Preferably in a way that allows manipulation of UDPipe's dependency trees?
For example, to ensure gender agreement between a noun and attached
adjectives.

I would be happy to hear any other advice!

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Thoughts on UDPipe, Apertium modules and translation system for Interslavic

Reply via email to