On Sun, Mar 22, 2020 at 02:58:41PM +0530, Tanmai Khanna wrote:
> Hey Hèctor,
> You're right, the task of creating a SL Lemma + TL morph is not a trivial
> one, and as we discussed on the IRC recently, the task of eliminating
> trimming is an essential first step towards that goal.

Yes, and I like the example in your initial email clearly shows an
example of what we could be doing wen we achieve the elimination of
trimming. I also agree that the actual guessing device is in itself a
separate project and we can perhaps treat it as a stretch goal of a
sort.

> So as for the task for this GSoC, it would probably be to first eliminate
> dictionary trimming, use the source analysis and output the source word
> surface form (as an unknown) instead of source lemma. This would give us
> the benefits of trimming without actually trimming. Then we can set the
> foundations for a morph guessing idea, which can evolve over time - and
> yes, initially it would be an optional module.


Yeah it sounds good. For this project as well I would recommend taking
test-driven development approach, it fits well for the use case since we
have a stable code base with large user base who would not be happy of
any regression.. 

The morphology guessing I also find an interesting task, for affixing
languages (and to certain degree of morph variation) I have developed
earlier a finite-state algorithm for making affix guessers, that could
be usable (in HFST library as guessify / affix-guessify), it's a bit of
a prototype and has issues with efficiency and the stability but the FSA
algebra should be correct if the underlying FSA library's understanding
of unknown alphabets works.


-- 
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
<http://gtweb.uit.no/sigur/>.
I tend to follow inline-posting style in desktop e-mail messages.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to