Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Samuel Sloniker Sun, 14 Jun 2020 05:33:03 -0700

If we do have a new election, can we vote on the new bylaws first, so we
use STV?


On Sun, Jun 14, 2020, 04:01 Francis Tyers <fty...@prompsit.com> wrote:

> El 2020-06-14 11:51, Hèctor Alòs i Font escribió:
> > Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de
> > juny 2020 a les 10:32:
> >
> >> El 2020-06-13 23:18, Jonathan Washington escribió:
> >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com>
> >> wrote:
> >>>
> >>>> El 2020-06-13 19:31, Xavi Ivars escribió:
> >>>>> Before anything, let me say that I like the proposal to enhance
> >>>> the
> >>>>> pipeline with more data (including, but not limited to the
> >> surface
> >>>>> forms), to be able to do properly do things that currently we're
> >>>> doing
> >>>>> in veeeery hacky (to me) and definitely non-linguistic ways
> >>>>>
> >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
> >>>>>> spa-morph
> >>>>>> ^El/el<det><def><m><sg>$
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> ^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$
> >>>>>
> >>>>> In this example, we "add" semantic information to the pipeline
> >>>> (and
> >>>>> disambiguate via CG3) by creating a "fake lemma" needed for
> >>>> SPA-CAT,
> >>>>> because "mango<n>" (pan stick) and "mango_fruta<n>" are
> >> translated
> >>>>> differently in Catalan. But this, in turn, forces every other
> >>>> language
> >>>>> pair using Spanish to know about "mango_fruta<n>" even if the
> >>>>> translation was the same as "mango<n>".
> >>>>>
> >>>>
> >>>> What is the problem here? That "mango" has two possible lemmas
> >> and
> >>>> paradigms
> >>>> in Spanish?
> >>>>
> >>>> The way that I've treated that is to have mango¹ and mango²,
> >> like
> >>>> in a
> >>>> traditional dictionary. I don't think that this requires any
> >> further
> >>>
> >>>> information.
> >>>
> >>> I think Xavi's point is that there are a number of ways to
> >> approach
> >>> this, and having the option of another stream to put this extra
> >>> information could be one of them.  Imho, it is nicer in many ways
> >> than
> >>> even having (very arbitrary) superscripts (that aren't really any
> >>> better to have in a morphological analysis than _fruta).
> >>>
> >>
> >> It's following what the lexicographers do:
> >>
> >> https://dle.rae.es/?w=mango
> >>
> >> So it's following a fairly established practice.
> >>
> >> Fran
> >
> > As far as I understand the mango's issue, Xavi is contemplating the
> > possibility of a semantic module which would add extra information
> > that may be used by other models (especially by the lexical selection
> > one) to add information about "mango". This could be used for
> > distinguishing between a handle or a fruit, but in fact not only.
> > "Mango" can be the fruit and the plant. One could eventually add what
> > kind of handle it is, e.g. in the RAE dictionary provided by Fran's
> > the handle of a knife is specifically distinguished among other
> > handles. As Xavi shows, this extra information could be added so that
> > it can be ignored by pairs who don't need it. It seems clear that the
> > solution based on being able to add any additional secondary
> > information is more versatile, instead of "_fruta", "_2" and the like.
> >
> > Moreover, in the lexical selection we have lots of lists like "fruit",
> > "building", "person", "device", etc. (and if we don't it this because
> > of a lack of time for writing them). It would be easier if a module
> > like the one Xavi imagines could add this kind of information and it
> > could be moved through the pipeline.
> >
> > I am not a technician, nor am I a computer linguist. I don't know, nor
> > do I understand, the implications of Tanmai and Tino's proposals in
> > terms of system performance. But, from the point of view of someone
> > with some experience in developing Apertium language pairs, I would
> > love some tool that would allow adding semantic information to the
> > pipeline.
> >
> > Other kind of contextual information that would also be useful for me
> > are things like the type of publication (a chat between friends or a
> > medical encyclopedia?), the dialect, the year of publication, etc. It
> > would go very well for both lexical selections and, sometimes, for
> > transfer rules.
> >
> > I don't know if this has helped the discussion at all or... si he
> > pixat completament fora de test.
> >
>
> Thanks for the comments Hèctor. I think that this kind of information
> could
> certainly be useful in the pipeline. But I think that determining how it
> should
> be added and where it should be added is a separate issue.
>
> What would a "semantic tagging" module look like, would it be rule
> based?
> statistical? where would the data come from? I could imagine using
> Wikipedia
> to extract it.
>
> I have no objections to the development of a well-specified and
> well-designed module
> for doing semantic tagging.
>
> Fran
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

Reply via email to