Yes, I have seen your modification about the idea page details a few
days ago, so I made some changes to my proposal according to it.

I read some source code about lexical analysis, such as lt-proc.cc,
state.cc, fst_processer.cc, especially FSTProcessor::analysis() and
State::filterFinals(). I thought these may be tokenization processes
and the parts that need to be modified.

Of course, as a Chinese student, I would also be very happy to work
on the CJK. We can keep communicating about the tweaks of the plan
and the other details.

Weizhe

On Fri, Mar 27, 2020 at 3:50 AM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Thu, Mar 26, 2020 at 11:45:41PM +0800, 杨伟哲 wrote:
> > Hi Francis and Flammie,
> >
> > I have finished the draft of my proposal about "Robust tokenization in
> > lttoolbox".
> > Could you please review it for me? I need your feedback suggestions and I
> > will
> > be pretty much appreciated.
> >
> > Google Docs link:
> >
> https://docs.google.com/document/d/1nHSR67u1HOO7ZhE5ulEn18ib3GKT31t958xp83Lbdqk/edit?usp=sharing
>
> Hi Weizhe,
>
> sorry I haven't answered earlier, the coding challenge looks ok. I
> updated the idea page last weekish, did you check the new version? Also,
> I think we were talking earlier about Chinese languages in apertium? if
> you have experience with this, I would be happy to tie in a strategy for
> CJK  or similar tokenisations to this project. That might also involve
> some tweaks to the planned tokenisation? As for the plan, it seems
> realistic.
>
> Do you have the feeling that you know the parts of apertium pipeline to
> modify for the project? As I don't have so in-depth knowledge of the
> apertium codebase, it'd be of high importance to get feedback or
> co-mentor with that knowledge.
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> <https://flammie.github.io/purplemonkeydishwasher/>, Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> <http://gtweb.uit.no/sigur/>.
> I tend to follow inline-posting style in desktop e-mail messages.
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to