Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de juny
2020 a les 10:32:

> El 2020-06-13 23:18, Jonathan Washington escribió:
> > On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com> wrote:
> >
> >> El 2020-06-13 19:31, Xavi Ivars escribió:
> >>> Before anything, let me say that I like the proposal to enhance
> >> the
> >>> pipeline with more data (including, but not limited to the surface
> >>> forms), to be able to do properly do things that currently we're
> >> doing
> >>> in veeeery hacky (to me) and definitely non-linguistic ways
> >>>
> >>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
> >>>> spa-morph
> >>>> ^El/el<det><def><m><sg>$
> >>>>
> >>>
> >>
> >
> ^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$
> >>>
> >>> In this example, we "add" semantic information to the pipeline
> >> (and
> >>> disambiguate via CG3) by creating a "fake lemma" needed for
> >> SPA-CAT,
> >>> because "mango<n>" (pan stick) and "mango_fruta<n>" are translated
> >>> differently in Catalan. But this, in turn, forces every other
> >> language
> >>> pair using Spanish to know about "mango_fruta<n>" even if the
> >>> translation was the same as "mango<n>".
> >>>
> >>
> >> What is the problem here? That "mango" has two possible lemmas and
> >> paradigms
> >> in Spanish?
> >>
> >> The way that I've treated that is to have mango¹ and mango², like
> >> in a
> >> traditional dictionary. I don't think that this requires any further
> >
> >> information.
> >
> > I think Xavi's point is that there are a number of ways to approach
> > this, and having the option of another stream to put this extra
> > information could be one of them.  Imho, it is nicer in many ways than
> > even having (very arbitrary) superscripts (that aren't really any
> > better to have in a morphological analysis than _fruta).
> >
>
> It's following what the lexicographers do:
>
> https://dle.rae.es/?w=mango
>
> So it's following a fairly established practice.
>
> Fran
>

As far as I understand the mango's issue, Xavi is contemplating the
possibility of a semantic module which would add extra information that may
be used by other models (especially by the lexical selection one) to add
information about "mango". This could be used for distinguishing between a
handle or a fruit, but in fact not only. "Mango" can be the fruit and the
plant. One could eventually add what kind of handle it is, e.g. in the RAE
dictionary provided by Fran's the handle of a knife is specifically
distinguished among other handles. As Xavi shows, this extra information
could be added so that it can be ignored by pairs who don't need it. It
seems clear that the solution based on being able to add any additional
secondary information is more versatile, instead of "_fruta", "_2" and the
like.

Moreover, in the lexical selection we have lots of lists like "fruit",
"building", "person", "device", etc. (and if we don't it this because of a
lack of time for writing them). It would be easier if a module like the one
Xavi imagines could add this kind of information and it could be moved
through the pipeline.

I am not a technician, nor am I a computer linguist. I don't know, nor do I
understand, the implications of Tanmai and Tino's proposals in terms of
system performance. But, from the point of view of someone with some
experience in developing Apertium language pairs, I would love some tool
that would allow adding semantic information to the pipeline.

Other kind of contextual information that would also be useful for me are
things like the type of publication (a chat between friends or a medical
encyclopedia?), the dialect, the year of publication, etc. It would go very
well for both lexical selections and, sometimes, for transfer rules.

I don't know if this has helped the discussion at all or... si he pixat
completament fora de test.

Hèctor
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to