On Mon, Mar 23, 2020 at 04:46:06PM +0530, Ayush wrote: > Dear sir, > Actually I have quite reached nowhere while going through the lttoolbox. Can > you please help me with making of schedule for the proposal and also what all > thinks I would be working under for the task of robust tokenisation. I know > that I have to update lttoolbox to be fully Unicode but how?
Hi, the lttoolbox part of the code is one that is also not my area of expertise and it would be a good thing for the application to recruit a co-mentor or advisor who knows lttoolbox internals. That said, I would suggest to start figuring out just the user point of view of tokenisation at the moment, take a handful of languages from current apertium set, e.g. English, Finnish, Kazakh, Norwegian, German, and maybe some spaceless script if there are any. Find kind of test cases how they work currently and where they could improve and approach the gsoc schedule as a test-driven software engineering project. It may be hard to spread such schedule to three months timeline but when you have some targets uncovered like so we can discuss what additional steps are likely to take time-. > -- Regards, Flammie <https://flammie.github.io> (Please note, that I will often include my replies inline instead of top or bottom of the mail)
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff