Hi Tomohiro, Actually, my point was that there is still a lot to be done. The work I pointed you to is a proof of concept more than anything, and it has not been integrated into Apertium.
If I were you, and interested in participating in GSoC, I would have a look at those resources and try to get them running, and figure out how they work and what the limitations are. That will give you a good idea of what still needs to be done. -- Jonathan On Wed, Feb 26, 2020, 08:41 Tomohiro Akazawa <tomohiroakaz...@gmail.com> wrote: > Hi Jonathan, > > thank you for your feedback. > there seem to be enough implementations for Japanese. > > -- > Tomohiro > > 2020年2月26日(水) 22:26 Jonathan Washington <jonathan.n.washing...@gmail.com>: > >> Hi Tommi, all, >> >> A couple years ago, a Swarthmore student implemented an algorithm for >> tokenisation of spaceless orthographies using morphological transducers. >> She used a fork of a prototype Japanese transducer developed by another of >> my students to evaluate it. >> >> The work is available at the following urls: >> >> https://scholarship.tricolib.brynmawr.edu/handle/10066/20002 >> >> https://github.com/chanlon1/tokenisation >> >> https://github.com/chanlon1/apertium-jpn >> >> -- >> Jonathan >> >> On Wed, Feb 26, 2020, 06:38 Tomohiro Akazawa <tomohiroakaz...@gmail.com> >> wrote: >> >>> Thank you for your reply. >>> If "improving the support of Japanese on Apertium" could be a new >>> project on GSoC, I would find the problems of the current version of >>> Apertium and figure out the solutions for them. >>> Thank you. >>> >>> 2020年2月26日(水) 0:47 Tommi A Pirinen <tommi.antero.piri...@uni-hamburg.de >>> >: >>> >>>> Hi all, >>>> one thing that might be worth considering ia improving support of >>>> Japanese in Apertium, is that we currently do not have any good >>>> generic solution for the word-tokenisation, this affects especially >>>> languages like Japanese where a space- and punct-based tokenisation is >>>> much more suboptimal than for European languages. If you'd be >>>> interested in >>>> formulating a project solving the tokenisation problem, I think it would >>>> fit to Apertium gsoc quite well, and if others agree I could (co-)mentor >>>> >>>> On Mon, Feb 24, 2020 at 06:12:28AM +0900, Tomohiro Akazawa wrote: >>>> > Thank you for your reply. >>>> > Considering there are many resources for English and Japanese, >>>> possibly I >>>> > should change my plan . >>>> > Thank you >>>> >>>> >>>> >>>> > On Sun, 23 Feb 2020, 23:58 Hèctor Alòs i Font, <hectora...@gmail.com> >>>> wrote: >>>> > >>>> > > Hi Tomohiro, >>>> > > >>>> > > Maybe it is not the 2019 version of the application form, but the >>>> 2020 one >>>> > > (if Apertium is elected by Google as a partner organisation) should >>>> not be >>>> > > very different of this one: >>>> > > http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications >>>> > > Essentially, for a pair like English and Japanese the main questions >>>> > > probably will be: >>>> > > >>>> > > * reasons why Google and Apertium should sponsor it, >>>> > > * a description of how and who it will benefit in society, >>>> > > >>>> > > (essentially because both English and Japanese are resourceful >>>> languages). >>>> > > Imho, Okinawan-Japanese would be a much more Apertium-like >>>> proposal. But, >>>> > > of course, I may be wrong. I should maybe add that for building a >>>> > > translator it is not absolutely necessary to be proficient in the >>>> source >>>> > > language. If you can read it and you have access to grammars, >>>> dictionaries >>>> > > and informants, this is usually enough. But, of course, the more >>>> you know >>>> > > the source language (not only the target one), the better. >>>> > > >>>> > > Hèctor >>>> > > >>>> > > Missatge de Tomohiro Akazawa <tomohiroakaz...@gmail.com> del dia >>>> dg., 23 >>>> > > de febr. 2020 a les 14:27: >>>> > > >>>> > >> Hello. >>>> > >> My name is Tomohiro and I am a student of the University of Tokyo >>>> in >>>> > >> Japan. >>>> > >> Seeing the Apertium's idea list for GSoC 2020, I found "Adopt an >>>> > >> unreleased language pair" interesting. >>>> > >> Do you think it is possible to make the language pair between >>>> English >>>> > >> and Japanese? >>>> > >> Thank you very much. >>>> > >> _______________________________________________ >>>> > >> Apertium-stuff mailing list >>>> > >> Apertium-stuff@lists.sourceforge.net >>>> > >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> > >> >>>> > > _______________________________________________ >>>> > > Apertium-stuff mailing list >>>> > > Apertium-stuff@lists.sourceforge.net >>>> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> > > >>>> >>>> >>>> > _______________________________________________ >>>> > Apertium-stuff mailing list >>>> > Apertium-stuff@lists.sourceforge.net >>>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>>> >>>> -- >>>> Doktor Tommi A Pirinen, Computational Linguist, >>>> <https://flammie.github.io/purplemonkeydishwasher/>, Universität >>>> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D >>>> Entwickler. President of ACL SIGUR SIG for Uralic languages >>>> <http://gtweb.uit.no/sigur/>. >>>> I tend to follow inline-posting style in desktop e-mail messages. >>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff