G'day... https://github.com/GavinWz/Apertium does not solve the challenge. The point is to categorize all of Unicode, not just ASCII. I would recommend using ICU for it.
And the code is C. We use C++. -- Tino Didriksen On Fri, 28 Feb 2020 at 02:43, 杨伟哲 <gavinwzma...@gmail.com> wrote: > Hi list, > > I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC project. > And > currently I’m writing the proposal. > > I have completed the code challenge listed in the project, which has been > put > on Pastebin[2]. However, I’m not quite clear where this project starting > with. > And I will be much appreciate if you could list somewhere (e.g. GitHub repo > related to this project) for me to get started with. I will also try to > learn > and solve issues there if possible. > > Bio: I’m Chinese undergraduate in Software Engineering. In my freshman > year, I > joined the high-performance computing center[3] of the university as a > research > assistant. Through research and learning during the period, I have a deep > understanding of software architecture and open source projects. > > > [1] > http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation > > [2] https://github.com/GavinWz/Apertium > > [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm > > > Regards, > > Weizhe Yang >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff