Hey guys,
Dictionary trimming is the process of removing those words and their
analyses from monolingual language models (FSTs compiled from monodixes)
which don't have an entry in the bidix, to avoid a lot of untranslated
lemmas (with an @ if debugging) in the output, which lead to issues with
comprehension and post-editing the output.

There is a GSoC project
<http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Eliminate_trimming>
which aims to eliminate this trimming and propose a solution such that you
don't lose the benefits of dictionary trimming as well. In this email I
will list a summary of the discussion that has taken place up until now.

By trimming the dictionary, you throw away valuable analyses of words in
the source language, which, if preserved, can be used as context for
lexical selection and analysis of the input. Also, several transfer rules
don't match as the word is shown as unknown.

Several solutions are possible for avoiding trimming, some of which have
been discussed by Unhammer here
<http://wiki.apertium.org/wiki/Talk:Why_we_trim>. These involve keeping the
surface form of the source word, and the lemma+analysis as well - use the
analysis till you need it in the pipe and then propagate the source form as
an unknown word (like it would be done in trimming).

Another interesting solution that was discussed was that instead of just
propagating the source surface form, we can output [source-word lemma +
target morphology], as is shown in this example by Mikel:

Translating from Basque to English:
"Andonik izarak izeki zuen" ('Andoni hung up the sheets') → 'Andoni
*izeki-ed the sheets".

This might help in comprehensibility of the output, and to some extent even
the post-editability.

If you have any significant pros, cons, or suggestions to add for this
project, you're requested to reply to this thread so that if I work on this
project, I can do it fully informed.

Thanks and Regards,
Tanmai Khanna

-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to