On Sun, Mar 22, 2020 at 02:58:41PM +0530, Tanmai Khanna wrote: > Hey Hèctor, > You're right, the task of creating a SL Lemma + TL morph is not a trivial > one, and as we discussed on the IRC recently, the task of eliminating > trimming is an essential first step towards that goal.
Yes, and I like the example in your initial email clearly shows an example of what we could be doing wen we achieve the elimination of trimming. I also agree that the actual guessing device is in itself a separate project and we can perhaps treat it as a stretch goal of a sort. > So as for the task for this GSoC, it would probably be to first eliminate > dictionary trimming, use the source analysis and output the source word > surface form (as an unknown) instead of source lemma. This would give us > the benefits of trimming without actually trimming. Then we can set the > foundations for a morph guessing idea, which can evolve over time - and > yes, initially it would be an optional module. Yeah it sounds good. For this project as well I would recommend taking test-driven development approach, it fits well for the use case since we have a stable code base with large user base who would not be happy of any regression.. The morphology guessing I also find an interesting task, for affixing languages (and to certain degree of morph variation) I have developed earlier a finite-state algorithm for making affix guessers, that could be usable (in HFST library as guessify / affix-guessify), it's a bit of a prototype and has issues with efficiency and the stability but the FSA algebra should be correct if the underlying FSA library's understanding of unknown alphabets works. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic languages <http://gtweb.uit.no/sigur/>. I tend to follow inline-posting style in desktop e-mail messages.
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff