Hi everyone. The Interslavic language is a constructed language that is created in such a way that people from Slavic countries are able to understand most of it without any prior education. It has a Wikipedia page and everything (maybe we even will have an ISO-639-3 code "ISV" in the future, fingers crossed!).
I'm looking into developing some sort of MT system for Interslavic (mainly the "Some Natural Slavic Language -> Interslavic" direction). I've managed to cobble a prototype with Russian UDPipe and ISV morphological data/rules before finding out about Apertium (and you guys seem interesting). The thing is, Russian and Czech are probably the richest Slavic languages in terms of NLP resources. Apertium obviously isn't going to beat a dependency parser that was trained on >1M of labeled sentences. So, I don't really need any of the earlier stages of the Apertium pipeline. However, the chunking and multi-word-expression modules seem promising, especially given that I probably could re-use already existing rules (that are written for different Slavic languages, but it doesn't matter). So, my question is: is it possible to use the chunking module in isolation? Preferably in a way that allows manipulation of UDPipe's dependency trees? For example, to ensure gender agreement between a noun and attached adjectives. I would be happy to hear any other advice!
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff