Re: [Apertium-stuff] Working around monodix trimming

Mikel L. Forcada Sun, 22 Mar 2020 00:26:08 -0700

For suffixing or prefixing languages, you could expand the morphological 
dictionary and use an algorithm such as OSTIA (1) to learn morphological 
analyses for word endings.


Mikel

(1) Oncina, J., Garcia, P., Vidal, E., IEEE Trans Patt Recog Mach Intell 15:5 
(1993)448-458.






El 21 de març de 2020 21:12:16 CET, Tanmai Khanna <khanna.tan...@gmail.com> ha 
escrit:
>Guessing the morphology would definitely require some creativity, but
>yes a
>guessing dictionary could be created. As mentioned, it would assign
>morphs
>to morphological analysis in the TL. The easiest (and the most naive)
>way
>to do this might be to take all the entries with that analysis and find
>a
>common substring. It will be more complex for morphemes that aren't
>prefix
>or suffixes or even process morphemes. However, to work towards a morph
>analyser that can assign morphs to analyses sounds like a good goal to
>work
>towards, and eliminating dictionary trimming is an essential step in
>that
>direction.
>
>Tanmai
>
>On Sat, Mar 21, 2020 at 9:48 PM Mikel L. Forcada <m...@dlsi.ua.es>
>wrote:
>
>> This looks interesting.
>>
>> Note that generating target language morphology may not always be
>> possible, unless a "guessing" dictionary is created automatically
>from both
>> the source and target dictionaries. A "guessing" dictionary would try
>to
>> assign a morphological analysis to an unknown word by looking at the
>> morphology of known words in the dictionary...
>>
>> This would be easy if one could, e.g. match suffixes to morphology in
>a
>> suffixing language.
>>
>> Mikel
>>
>>
>> El 21/3/20 a les 15:37, Tanmai Khanna ha escrit:
>>
>> Hey guys,
>> Dictionary trimming is the process of removing those words and their
>> analyses from monolingual language models (FSTs compiled from
>monodixes)
>> which don't have an entry in the bidix, to avoid a lot of
>untranslated
>> lemmas (with an @ if debugging) in the output, which lead to issues
>with
>> comprehension and post-editing the output.
>>
>> There is a GSoC project
>>
><http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Eliminate_trimming>
>> which aims to eliminate this trimming and propose a solution such
>that you
>> don't lose the benefits of dictionary trimming as well. In this email
>I
>> will list a summary of the discussion that has taken place up until
>now.
>>
>> By trimming the dictionary, you throw away valuable analyses of words
>in
>> the source language, which, if preserved, can be used as context for
>> lexical selection and analysis of the input. Also, several transfer
>rules
>> don't match as the word is shown as unknown.
>>
>> Several solutions are possible for avoiding trimming, some of which
>have
>> been discussed by Unhammer here
>> <http://wiki.apertium.org/wiki/Talk:Why_we_trim>. These involve
>keeping
>> the surface form of the source word, and the lemma+analysis as well -
>use
>> the analysis till you need it in the pipe and then propagate the
>source
>> form as an unknown word (like it would be done in trimming).
>>
>> Another interesting solution that was discussed was that instead of
>just
>> propagating the source surface form, we can output [source-word lemma
>+
>> target morphology], as is shown in this example by Mikel:
>>
>> Translating from Basque to English:
>> "Andonik izarak izeki zuen" ('Andoni hung up the sheets') → 'Andoni
>> *izeki-ed the sheets".
>>
>> This might help in comprehensibility of the output, and to some
>extent
>> even the post-editability.
>>
>> If you have any significant pros, cons, or suggestions to add for
>this
>> project, you're requested to reply to this thread so that if I work
>on this
>> project, I can do it fully informed.
>>
>> Thanks and Regards,
>> Tanmai Khanna
>>
>> --
>> *Khanna, Tanmai*
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing
>listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> --
>> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
>> Departament de Llenguatges i Sistemes Informàtics
>> Universitat d'Alacant
>> E-03690 Sant Vicent del Raspeig
>> Spain
>> Office: +34 96 590 9776
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
>-- 
>*Khanna, Tanmai*

-- 
Enviat des del meu dispositiu Android amb el K-9 Mail. Disculpeu la brevetat.

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Working around monodix trimming

Reply via email to