Re: [Apertium-stuff] Working around monodix trimming

Mikel L. Forcada Sat, 21 Mar 2020 09:17:46 -0700

This looks interesting.

Note that generating target language morphology may not always bepossible, unless a "guessing" dictionary is created automatically fromboth the source and target dictionaries. A "guessing" dictionary wouldtry to assign a morphological analysis to an unknown word by looking atthe morphology of known words in the dictionary...

This would be easy if one could, e.g. match suffixes to morphology in asuffixing language.


Mikel


El 21/3/20 a les 15:37, Tanmai Khanna ha escrit:

Hey guys,
Dictionary trimming is the process of removing those words and theiranalyses from monolingual language models (FSTs compiled frommonodixes) which don't have an entry in the bidix, to avoid a lot ofuntranslated lemmas (with an @ if debugging) in the output, which leadto issues with comprehension and post-editing the output.
There is a GSoC project<http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Eliminate_trimming>which aims to eliminate this trimming and propose a solution such thatyou don't lose the benefits of dictionary trimming as well. In thisemail I will list a summary of the discussion that has taken place upuntil now.
By trimming the dictionary, you throw away valuable analyses of wordsin the source language, which, if preserved, can be used as contextfor lexical selection and analysis of the input. Also, severaltransfer rules don't match as the word is shown as unknown.
Several solutions are possible for avoiding trimming, some of whichhave been discussed by Unhammer here<http://wiki.apertium.org/wiki/Talk:Why_we_trim>. These involvekeeping the surface form of the source word, and the lemma+analysis aswell - use the analysis till you need it in the pipe and thenpropagate the source form as an unknown word (like it would be done intrimming).
Another interesting solution that was discussed was that instead ofjust propagating the source surface form, we can output [source-wordlemma + target morphology], as is shown in this example by Mikel:
Translating from Basque to English:
"Andonik izarak izeki zuen" ('Andoni hung up the sheets') → 'Andoni*izeki-ed the sheets".
This might help in comprehensibility of the output, and to some extenteven the post-editability.
If you have any significant pros, cons, or suggestions to add for thisproject, you're requested to reply to this thread so that if I work onthis project, I can do it fully informed.
Thanks and Regards,
Tanmai Khanna

--
*Khanna, Tanmai*


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


--
Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03690 Sant Vicent del Raspeig
Spain
Office: +34 96 590 9776

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Working around monodix trimming

Reply via email to