Re: [Apertium-stuff] Semantics in Apertium (was Apertium's Wider Use & Secondary Tags)

Francis Tyers Thu, 18 Jun 2020 01:56:23 -0700

El 2020-06-18 07:59, Hèctor Alòs i Font escribió:

Missatge de Francis Tyers <fty...@prompsit.com> del dia dj., 18 de
juny 2020 a les 1:59:

El 2020-06-17 21:46, Hèctor Alòs i Font escribió:

Missatge de Hèctor Alòs i Font <hectora...@gmail.com> del dia

dc.,

17 de juny 2020 a les 23:36:

Missatge de Francis Tyers <fty...@prompsit.com> del dia dc., 17

de

juny 2020 a les 21:12:

El 2020-06-15 17:38, Hèctor Alòs i Font escribió:



...snip...


I'd add that one of the problems with that is that this synonyms

may

be polysemic. For instance "bubota" seems to be quite widely used

in

Balearic Catalan, but can mean both "scarecrow" and "ghost".

Probably

just one of the two could be selected as synonym if "bubota" is
missing in a bilingual dictionary.


Yep, this is the kind of thing that people are working on at the
moment
with neural machine translation. For example in translating informal
texts, how do you make sure that you get the translations of
"today",
"2day" "tooday" "tday" etc, such as in:

https://www.clsp.jhu.edu/workshops/19-workshop/improving-translation-of-informal-language/

This kind of problems are typical, and even very often, for languages
without or with a weak standard. For instance, currently in Arpitan
along the standard termination "ament" in many nouns and adverbs, I
found tens of "ement" and even "èment". Similarly instead of "ê" I
found "è", or the opposite, and instead of "â" I found "a", or the
opposite. It's a big mess, when I get "real" texts on the net. But
defining in the monodix that every "ê" can be "è", and the opposite,
every "â" can be "a", and the opposite" would cause a huge quantity
of homonyms that would make disambiguation almost impossible (so I
won't do it).


Yes, I found the same in K'iche'. One of the things that can be done in
this case is to have a "spellrelax" transducer which is composed on top
of the other transducer.

So, this kind of improvement  may help translators from underresourced
languages... if enormous corpora are not required to learn the
"rules".

Well, enormous corpora are not a problem, so long as they are notrequired

in the under-resourced language. If a large French corpus can be used
to improve Arpitan, I don't see it as a problem.

Serge Sharoff is doing interesting things with embeddings andsyncretism:


http://corpus.leeds.ac.uk/serge/publications/2019-jnle.pdf

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Semantics in Apertium (was Apertium's Wider Use & Secondary Tags)

Reply via email to