El 2020-06-13 19:31, Xavi Ivars escribió:
Before anything, let me say that I like the proposal to enhance the
pipeline with more data (including, but not limited to the surface
forms), to be able to do properly do things that currently we're doing
in veeeery hacky (to me) and definitely non-linguistic ways

xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
spa-morph
^El/el<det><def><m><sg>$

^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$

In this example, we "add" semantic information to the pipeline (and
disambiguate via CG3) by creating a "fake lemma" needed for SPA-CAT,
because "mango<n>" (pan stick) and "mango_fruta<n>" are translated
differently in Catalan. But this, in turn, forces every other language
pair using Spanish to know about "mango_fruta<n>" even if the
translation was the same as "mango<n>".


What is the problem here? That "mango" has two possible lemmas and paradigms
 in Spanish?

The way that I've treated that is to have mango¹ and mango², like in a
traditional dictionary. I don't think that this requires any further
information.

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to