... and right after sending my previous email I decided that creating such a prototype was a more fun use of my flight home than treebanking, so here's what I came up with: https://github.com/mr-martian/UD-transfer
It probably needs another couple hours of work before it'll actually do anything, but I should be able to manage that pretty soon (the hardest part of any project is starting). Daniel On Thu, Dec 16, 2021 at 7:17 PM Daniel Swanson <awesomeevildu...@gmail.com> wrote: > > Greetings Apertiumers! > > Figuring out how to incorporate UD parsers into Apertium pipelines is > something that's been on my todo list for a while, but with the > unfortunate property that it keeps getting sidelined by projects that > have deadlines. > > With regards to your specific issue, here are the options I can think of: > > 1. apertium-transfer / chunking > The chunker can pretty much only process adjacent words. You can > encode dependency labels to some extent (e.g. > ^green/green<adj><sint><@amod>$), and the rules can refer to those > tags, but I don't think there's any way to access the actual relations > that isn't incredibly hacky and fragile. > > 2. apertium-recursive > This was created precisely because chunking can't handle long distance > relationships, but to actually use it, you'd end up somehow encoding > and then re-parsing the tree structure which is still fairly fragile > while also probably being an enormous waste of energy. > > 3. Constraint Grammar > VISL CG-3 can manipulate dependency trees and writing agreement rules > would be fairly straightforward, though you'd have to write them from > scratch rather than copying from existing sources. > > 4. Bug me to make a real solution > Prototyping a pipeline module to do pretty much exactly what you're > talking about is nominally fairly high on my todo list, and if someone > is actually waiting for it there's a decent amount of hope that I'll > actually start it rather than some other project. > > If your main concern is agreement, 3 strikes me as a pretty good > option. On the other hand, if you actually need to modify the tree > structure, 3 might get complicated in which case I'd recommend 4. > > Daniel > > On Thu, Dec 16, 2021 at 5:20 PM Виктор Булатов <bt.uy...@gmail.com> wrote: > > > > Hi everyone. The Interslavic language is a constructed language that is > > created in such a way that people from Slavic countries are able to > > understand most of it without any prior education. It has a Wikipedia page > > and everything (maybe we even will have an ISO-639-3 code "ISV" in the > > future, fingers crossed!). > > > > I'm looking into developing some sort of MT system for Interslavic (mainly > > the "Some Natural Slavic Language -> Interslavic" direction). I've managed > > to cobble a prototype with Russian UDPipe and ISV morphological data/rules > > before finding out about Apertium (and you guys seem interesting). > > > > The thing is, Russian and Czech are probably the richest Slavic languages > > in terms of NLP resources. Apertium obviously isn't going to beat a > > dependency parser that was trained on >1M of labeled sentences. So, I don't > > really need any of the earlier stages of the Apertium pipeline. However, > > the chunking and multi-word-expression modules seem promising, especially > > given that I probably could re-use already existing rules (that are written > > for different Slavic languages, but it doesn't matter). > > > > So, my question is: is it possible to use the chunking module in isolation? > > Preferably in a way that allows manipulation of UDPipe's dependency trees? > > For example, to ensure gender agreement between a noun and attached > > adjectives. > > > > I would be happy to hear any other advice! > > _______________________________________________ > > Apertium-stuff mailing list > > Apertium-stuff@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff