On Tue, Mar 7, 2023 at 6:07 AM Kevin Brubeck Unhammer <unham...@fsfe.org> wrote: > > Daniel Swanson > <awesomeevildu...@gmail.com> čálii: > > > Greetings Apertiumers! > > > > This morning I set out to change the Ancient Hebrew analyzer from > > Latin script to Hebrew script (a task I don't wish upon anyone) and in > > the process produced a search-and-replace tool that understands the > > structure of several of our source files: > > https://github.com/mr-martian/apertium-grep > > Awesome! > > > This script could, without too much trouble, be expanded to cover the > > rest of our source files, at which point I would like to propose that > > we move towards greater standardization of our tagset: > > https://wiki.apertium.org/wiki/List_of_symbols > > > > At minimum, I would like to deal with some of the duplicate tags, like > > impf/imperf, rec/res, v/vblex, pass/pasv, etc. > > That would be great! I'll put in a vote for pasv right now. > > > My preference would be that we also consider splitting compound tags, > > like the tense+mood (fti, fts, pii, pis) and maybe possessor and > > subject tags (px1sg, s_1sg). > > It makes sense to split tense and mood, as well as number and person, > but I doubt it can be done automatically – it will require careful > changes to CG and transfer. Might make sense to try it on one language > pair along with the maintainer and see how it goes. > > It would be very dangerous to turn <pxsg> into <px><sg> – that would > break lots of CG and transfer rules and possibly lead to more complexity > in tag matching since you now have to always check for the existence of > <px> whereever you check for <sg> etc.
To be clear, I meant splitting <px1sg> into <px1><pxsg>. One of my ideals for the tagset is that every tag be position-independent, so that the only reason I need to care about order is because of FST topology (and maybe not even then). Daniel _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff