Thank you for your reply. The project seems cool to work on for GSOC2023, and I would like to participate in. I reckon there are two tasks on the page and could you tell me where to start?
On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer <unham...@fsfe.org> wrote: > > I'd like to participate in Google Summer of Code 2023 at Apertium. > > In particular, I'm interested in adding new language pair and I am > > thinking to add Japanese-English as I speak Japanese. I took summer > > school at Tokyo University online on natural language processing > > before. > > Could you tell me more about the project? > > Hi, > > Getting some support for Japanese would be great! I'm not sure if you > saw the whole IRC discussion, but what we really need in that regard is > support for the *tokenisation* step, where our regular methods[1] fail > us, since the text might have no spaces and lots of > tokenisation-ambiguity. There has been some prior work[2] and it's > already listed as a potential GsoC project. > > Support for anything-Japanese depends on tokenisation. It's also a big > enough job that it would qualify as a full GsoC project, so if you were > hoping for jpn-eng in a summer you will be disappointeda (but having a > toy language pair to test with would help!). On the other hand, if we > get good spaceless tokenisation we open up the possibility for not just > Japanese, but Thai, Lao, Chinese etc. – and of course all those writing > systems used before the invention of the space character :) > > regards, > Kevin > > [1] https://wiki.apertium.org/wiki/LRLM > [2] http://hdl.handle.net/10066/20002 > [3] > https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff