El 2023-06-07 16:19, Daniel Swanson va escriure:
Greetings Apertiumers!

I've been reminded that derivational morphology exists, which throws a
wrench in my desire for full position-independent tags.

I've also been reminded that some repos have .udx files which specify
a conversion between Apertium tags and Universal Dependencies, but as
far as I know there isn't any documentation for this and I'm not even
sure where to find the script that processes them. Does anyone have
any further information on those files? I think it could be quite
useful to document and standardize them and adopt them more broadly.

Daniel

On Tue, Mar 7, 2023 at 2:22 PM Daniel Swanson
<awesomeevildu...@gmail.com> wrote:

Yes, most of our tools assume that tags are position independent, but
I've come across a handful of languages that treat some tags as
position dependent, and I was more hoping to make it official to make
it less likely that we run into issues with that.

Also, I have an idea for how to make a version of lt-proc -g that
accepts the tags in any order, which might be helpful for reducing
generation errors, though it may turn out to be too much of a slowdown
for production.

Daniel

On Tue, Mar 7, 2023 at 1:58 PM Kevin Brubeck Unhammer <unham...@fsfe.org> wrote:
>
> Daniel Swanson
> <awesomeevildu...@gmail.com> čálii:
>
> > To be clear, I meant splitting <px1sg> into <px1><pxsg>.
>
> 👍
>
> > One of my ideals for the tagset is that every tag be
> > position-independent, so that the only reason I need to care about
> > order is because of FST topology (and maybe not even then).
>
> Aren't the tags themselves already position-independent? Both CG and to
> a certain extent transfer assume that.
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff



The UDX format was made by me for converting vislcg3 style treebanks
to UD-style ones.

It works mostly with longest-overlap set matching on the input. Some challenges
are e.g. <prn><rel> vs. <n><rel>

<prn><rel> -> PRON PronType=Rel
<n><rel> -> NOUN NounType=Relat

I have a tonne of scripts that do it, one of which is:

https://github.com/ftyers/ud-scripts/blob/master/conllu-feats.py

I'd be happy to work on this topic as I find it interesting and there
are some substantial improvements that could be made over my existing
code.

Fran






_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to