Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Francis Tyers Sun, 14 Jun 2020 10:46:15 -0700

El 2020-06-14 11:51, Hèctor Alòs i Font escribió:

Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de
juny 2020 a les 10:32:

El 2020-06-13 23:18, Jonathan Washington escribió:

On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com>

wrote:

El 2020-06-13 19:31, Xavi Ivars escribió:

Before anything, let me say that I like the proposal to enhance

the

pipeline with more data (including, but not limited to the

surface

forms), to be able to do properly do things that currently we're

doing

in veeeery hacky (to me) and definitely non-linguistic ways

xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
spa-morph
^El/el<det><def><m><sg>$

^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$


In this example, we "add" semantic information to the pipeline

(and

disambiguate via CG3) by creating a "fake lemma" needed for

SPA-CAT,

because "mango<n>" (pan stick) and "mango_fruta<n>" are

translated

differently in Catalan. But this, in turn, forces every other

language

pair using Spanish to know about "mango_fruta<n>" even if the
translation was the same as "mango<n>".


What is the problem here? That "mango" has two possible lemmas

and

paradigms
in Spanish?

The way that I've treated that is to have mango¹ and mango²,

like

in a
traditional dictionary. I don't think that this requires any

further

information.


I think Xavi's point is that there are a number of ways to

approach

this, and having the option of another stream to put this extra
information could be one of them.  Imho, it is nicer in many ways

than

even having (very arbitrary) superscripts (that aren't really any
better to have in a morphological analysis than _fruta).


It's following what the lexicographers do:

https://dle.rae.es/?w=mango

So it's following a fairly established practice.

Fran


As far as I understand the mango's issue, Xavi is contemplating the
possibility of a semantic module which would add extra information
that may be used by other models (especially by the lexical selection
one) to add information about "mango". This could be used for
distinguishing between a handle or a fruit, but in fact not only.
"Mango" can be the fruit and the plant. One could eventually add what
kind of handle it is, e.g. in the RAE dictionary provided by Fran's
the handle of a knife is specifically distinguished among other
handles. As Xavi shows, this extra information could be added so that
it can be ignored by pairs who don't need it. It seems clear that the
solution based on being able to add any additional secondary
information is more versatile, instead of "_fruta", "_2" and the like.

Moreover, in the lexical selection we have lots of lists like "fruit",
"building", "person", "device", etc. (and if we don't it this because
of a lack of time for writing them). It would be easier if a module
like the one Xavi imagines could add this kind of information and it
could be moved through the pipeline.

I am not a technician, nor am I a computer linguist. I don't know, nor
do I understand, the implications of Tanmai and Tino's proposals in
terms of system performance. But, from the point of view of someone
with some experience in developing Apertium language pairs, I would
love some tool that would allow adding semantic information to the
pipeline.

Other kind of contextual information that would also be useful for me
are things like the type of publication (a chat between friends or a
medical encyclopedia?), the dialect, the year of publication, etc. It
would go very well for both lexical selections and, sometimes, for
transfer rules.


So, if I understand correctly, the desire is for a module that will do
lexical selection based on whole sentence context. Currently the
"mango" example is essentially getting around the fixed-length patterns
in lexical selection issue by moving the problem to the disambiguation
component.

I've taken some notes here:

https://wiki.apertium.org/wiki/Semantic_tagging

It would be great to have further examples of the kind of translation
problems that people would like to treat using such a module.

Note, that this is essentially treating the mango translation issue
as a bag of words lexical selection problem, e.g. given these words,
choose this translation.

It would be fairly straightforward to implement that as an option
for the lexical selection module, one could even imagine treating
them as features and weighting them.

More examples welcome!

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Reply via email to