Hi Tommi, all, A couple years ago, a Swarthmore student implemented an algorithm for tokenisation of spaceless orthographies using morphological transducers. She used a fork of a prototype Japanese transducer developed by another of my students to evaluate it.
The work is available at the following urls: https://scholarship.tricolib.brynmawr.edu/handle/10066/20002 https://github.com/chanlon1/tokenisation https://github.com/chanlon1/apertium-jpn -- Jonathan On Wed, Feb 26, 2020, 06:38 Tomohiro Akazawa <tomohiroakaz...@gmail.com> wrote: > Thank you for your reply. > If "improving the support of Japanese on Apertium" could be a new project > on GSoC, I would find the problems of the current version of Apertium and > figure out the solutions for them. > Thank you. > > 2020年2月26日(水) 0:47 Tommi A Pirinen <tommi.antero.piri...@uni-hamburg.de>: > >> Hi all, >> one thing that might be worth considering ia improving support of >> Japanese in Apertium, is that we currently do not have any good >> generic solution for the word-tokenisation, this affects especially >> languages like Japanese where a space- and punct-based tokenisation is >> much more suboptimal than for European languages. If you'd be interested >> in >> formulating a project solving the tokenisation problem, I think it would >> fit to Apertium gsoc quite well, and if others agree I could (co-)mentor >> >> On Mon, Feb 24, 2020 at 06:12:28AM +0900, Tomohiro Akazawa wrote: >> > Thank you for your reply. >> > Considering there are many resources for English and Japanese, possibly >> I >> > should change my plan . >> > Thank you >> >> >> >> > On Sun, 23 Feb 2020, 23:58 Hèctor Alòs i Font, <hectora...@gmail.com> >> wrote: >> > >> > > Hi Tomohiro, >> > > >> > > Maybe it is not the 2019 version of the application form, but the >> 2020 one >> > > (if Apertium is elected by Google as a partner organisation) should >> not be >> > > very different of this one: >> > > http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications >> > > Essentially, for a pair like English and Japanese the main questions >> > > probably will be: >> > > >> > > * reasons why Google and Apertium should sponsor it, >> > > * a description of how and who it will benefit in society, >> > > >> > > (essentially because both English and Japanese are resourceful >> languages). >> > > Imho, Okinawan-Japanese would be a much more Apertium-like proposal. >> But, >> > > of course, I may be wrong. I should maybe add that for building a >> > > translator it is not absolutely necessary to be proficient in the >> source >> > > language. If you can read it and you have access to grammars, >> dictionaries >> > > and informants, this is usually enough. But, of course, the more you >> know >> > > the source language (not only the target one), the better. >> > > >> > > Hèctor >> > > >> > > Missatge de Tomohiro Akazawa <tomohiroakaz...@gmail.com> del dia >> dg., 23 >> > > de febr. 2020 a les 14:27: >> > > >> > >> Hello. >> > >> My name is Tomohiro and I am a student of the University of Tokyo in >> > >> Japan. >> > >> Seeing the Apertium's idea list for GSoC 2020, I found "Adopt an >> > >> unreleased language pair" interesting. >> > >> Do you think it is possible to make the language pair between >> English >> > >> and Japanese? >> > >> Thank you very much. >> > >> _______________________________________________ >> > >> Apertium-stuff mailing list >> > >> Apertium-stuff@lists.sourceforge.net >> > >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > >> >> > > _______________________________________________ >> > > Apertium-stuff mailing list >> > > Apertium-stuff@lists.sourceforge.net >> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > > >> >> >> > _______________________________________________ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> >> -- >> Doktor Tommi A Pirinen, Computational Linguist, >> <https://flammie.github.io/purplemonkeydishwasher/>, Universität >> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D >> Entwickler. President of ACL SIGUR SIG for Uralic languages >> <http://gtweb.uit.no/sigur/>. >> I tend to follow inline-posting style in desktop e-mail messages. >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff