I think the last solution mentioned sounds best. On Sat, Mar 21, 2020, 07:38 Tanmai Khanna <khanna.tan...@gmail.com> wrote:
> Hey guys, > Dictionary trimming is the process of removing those words and their > analyses from monolingual language models (FSTs compiled from monodixes) > which don't have an entry in the bidix, to avoid a lot of untranslated > lemmas (with an @ if debugging) in the output, which lead to issues with > comprehension and post-editing the output. > > There is a GSoC project > <http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Eliminate_trimming> > which aims to eliminate this trimming and propose a solution such that you > don't lose the benefits of dictionary trimming as well. In this email I > will list a summary of the discussion that has taken place up until now. > > By trimming the dictionary, you throw away valuable analyses of words in > the source language, which, if preserved, can be used as context for > lexical selection and analysis of the input. Also, several transfer rules > don't match as the word is shown as unknown. > > Several solutions are possible for avoiding trimming, some of which have > been discussed by Unhammer here > <http://wiki.apertium.org/wiki/Talk:Why_we_trim>. These involve keeping > the surface form of the source word, and the lemma+analysis as well - use > the analysis till you need it in the pipe and then propagate the source > form as an unknown word (like it would be done in trimming). > > Another interesting solution that was discussed was that instead of just > propagating the source surface form, we can output [source-word lemma + > target morphology], as is shown in this example by Mikel: > > Translating from Basque to English: > "Andonik izarak izeki zuen" ('Andoni hung up the sheets') → 'Andoni > *izeki-ed the sheets". > > This might help in comprehensibility of the output, and to some extent > even the post-editability. > > If you have any significant pros, cons, or suggestions to add for this > project, you're requested to reply to this thread so that if I work on this > project, I can do it fully informed. > > Thanks and Regards, > Tanmai Khanna > > -- > *Khanna, Tanmai* > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff