On Mon, Mar 23, 2020 at 04:46:06PM +0530, Ayush wrote:
> Dear sir,
> Actually I have quite reached nowhere while going through the lttoolbox. Can 
> you please help me with making of schedule for the proposal and also what all 
> thinks I would be working under for the task of robust tokenisation. I know 
> that I have to update lttoolbox to be fully Unicode but how?

Hi,
the lttoolbox part of the code is one that is also not my area of
expertise and it would be a good thing for the application to recruit a
co-mentor or advisor who knows lttoolbox internals. That said, I would
suggest to start figuring out just the user point of view of
tokenisation at the moment, take a handful of languages from current
apertium set, e.g. English, Finnish, Kazakh, Norwegian, German, and
maybe some spaceless script if there are any. Find kind of test cases
how they work currently and where they could improve and approach the
gsoc schedule as a test-driven software engineering project. It may be
hard to spread such schedule to three months timeline but when you have
some targets uncovered like so we can discuss what additional steps are
likely to take time-. 
>  

-- 
Regards, Flammie <https://flammie.github.io>
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to