If we do have a new election, can we vote on the new bylaws first, so we use STV?
On Sun, Jun 14, 2020, 04:01 Francis Tyers <fty...@prompsit.com> wrote: > El 2020-06-14 11:51, Hèctor Alòs i Font escribió: > > Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de > > juny 2020 a les 10:32: > > > >> El 2020-06-13 23:18, Jonathan Washington escribió: > >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com> > >> wrote: > >>> > >>>> El 2020-06-13 19:31, Xavi Ivars escribió: > >>>>> Before anything, let me say that I like the proposal to enhance > >>>> the > >>>>> pipeline with more data (including, but not limited to the > >> surface > >>>>> forms), to be able to do properly do things that currently we're > >>>> doing > >>>>> in veeeery hacky (to me) and definitely non-linguistic ways > >>>>> > >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d . > >>>>>> spa-morph > >>>>>> ^El/el<det><def><m><sg>$ > >>>>>> > >>>>> > >>>> > >>> > >> > > > ^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$ > >>>>> > >>>>> In this example, we "add" semantic information to the pipeline > >>>> (and > >>>>> disambiguate via CG3) by creating a "fake lemma" needed for > >>>> SPA-CAT, > >>>>> because "mango<n>" (pan stick) and "mango_fruta<n>" are > >> translated > >>>>> differently in Catalan. But this, in turn, forces every other > >>>> language > >>>>> pair using Spanish to know about "mango_fruta<n>" even if the > >>>>> translation was the same as "mango<n>". > >>>>> > >>>> > >>>> What is the problem here? That "mango" has two possible lemmas > >> and > >>>> paradigms > >>>> in Spanish? > >>>> > >>>> The way that I've treated that is to have mango¹ and mango², > >> like > >>>> in a > >>>> traditional dictionary. I don't think that this requires any > >> further > >>> > >>>> information. > >>> > >>> I think Xavi's point is that there are a number of ways to > >> approach > >>> this, and having the option of another stream to put this extra > >>> information could be one of them. Imho, it is nicer in many ways > >> than > >>> even having (very arbitrary) superscripts (that aren't really any > >>> better to have in a morphological analysis than _fruta). > >>> > >> > >> It's following what the lexicographers do: > >> > >> https://dle.rae.es/?w=mango > >> > >> So it's following a fairly established practice. > >> > >> Fran > > > > As far as I understand the mango's issue, Xavi is contemplating the > > possibility of a semantic module which would add extra information > > that may be used by other models (especially by the lexical selection > > one) to add information about "mango". This could be used for > > distinguishing between a handle or a fruit, but in fact not only. > > "Mango" can be the fruit and the plant. One could eventually add what > > kind of handle it is, e.g. in the RAE dictionary provided by Fran's > > the handle of a knife is specifically distinguished among other > > handles. As Xavi shows, this extra information could be added so that > > it can be ignored by pairs who don't need it. It seems clear that the > > solution based on being able to add any additional secondary > > information is more versatile, instead of "_fruta", "_2" and the like. > > > > Moreover, in the lexical selection we have lots of lists like "fruit", > > "building", "person", "device", etc. (and if we don't it this because > > of a lack of time for writing them). It would be easier if a module > > like the one Xavi imagines could add this kind of information and it > > could be moved through the pipeline. > > > > I am not a technician, nor am I a computer linguist. I don't know, nor > > do I understand, the implications of Tanmai and Tino's proposals in > > terms of system performance. But, from the point of view of someone > > with some experience in developing Apertium language pairs, I would > > love some tool that would allow adding semantic information to the > > pipeline. > > > > Other kind of contextual information that would also be useful for me > > are things like the type of publication (a chat between friends or a > > medical encyclopedia?), the dialect, the year of publication, etc. It > > would go very well for both lexical selections and, sometimes, for > > transfer rules. > > > > I don't know if this has helped the discussion at all or... si he > > pixat completament fora de test. > > > > Thanks for the comments Hèctor. I think that this kind of information > could > certainly be useful in the pipeline. But I think that determining how it > should > be added and where it should be added is a separate issue. > > What would a "semantic tagging" module look like, would it be rule > based? > statistical? where would the data come from? I could imagine using > Wikipedia > to extract it. > > I have no objections to the development of a well-specified and > well-designed module > for doing semantic tagging. > > Fran > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff