Hello, I am thinking to work on the integration of apertium-3 into
apertium-jpn as Jonathan san suggested. Do I need to language data for it?
I have already installed dev tools locally.

Also, I’ve found an issue in apertium-jpn, and I wonder should I do this
for something like a coding challenge?

Cheers,

*Sorry for your inconvenient to be asked through email. IRC seems weird for
my account now.

On Mon, 27 Feb 2023 at 01:08, Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Hi Eijisan,
>
> There's also the tokeniser used for Nuosu, which uses the transducer
> itself to tokenise:
> https://github.com/apertium/apertium-iii
>
> I believe this is a later implementation of what's described in the thesis
> sent by Kevin in [2].
>
> This method has some downsides, but it also has some advantages over a
> statistical model.  Perhaps a way to get started would be to explore the
> pros and cons of each approach, and think about what a hybrid model could
> achieve.  It would be good to join the IRC channel to discuss all this with
> the mentors.
>
> Another good way to get started (and it would help you do the above too)
> would be to integrate the tokeniser from apertium-iii into apertium-jpn:
> https://github.com/apertium/apertium-jpn
>
> You would need to modify the Makefile.am, the modes.xml file, drop in the
> tokeniser script, and that's about it?  Then see if you can get it to
> analyse text without spaces (test it first with the same text,
> hand-tokenised, to see what the output is).  Again, come to IRC for
> guidance.
>
> The tokeniser.py script is a bit slow, mainly because of Python string
> processing.  Rewriting it in C/C++ would be useful, and also a good way to
> get a better handle on how it works.
>
> --
> Jonathan
>
>
> On Fri, Feb 24, 2023, 13:03 Eiji Miyamoto <motopo...@gmail.com> wrote:
>
>> Thank you for your reply. The project seems cool to work on for GSOC2023,
>> and I would like to participate in. I reckon there are two tasks on the
>> page and could you tell me where to start?
>>
>> On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer <unham...@fsfe.org>
>> wrote:
>>
>>> > I'd like to participate in Google Summer of Code 2023 at Apertium.
>>> > In particular, I'm interested in adding new language pair and I am
>>> > thinking to add Japanese-English as I speak Japanese. I took summer
>>> > school at Tokyo University online on natural language processing
>>> > before.
>>> > Could you tell me more about the project?
>>>
>>> Hi,
>>>
>>> Getting some support for Japanese would be great! I'm not sure if you
>>> saw the whole IRC discussion, but what we really need in that regard is
>>> support for the *tokenisation* step, where our regular methods[1] fail
>>> us, since the text might have no spaces and lots of
>>> tokenisation-ambiguity. There has been some prior work[2] and it's
>>> already listed as a potential GsoC project.
>>>
>>> Support for anything-Japanese depends on tokenisation. It's also a big
>>> enough job that it would qualify as a full GsoC project, so if you were
>>> hoping for jpn-eng in a summer you will be disappointeda (but having a
>>> toy language pair to test with would help!). On the other hand, if we
>>> get good spaceless tokenisation we open up the possibility for not just
>>> Japanese, but Thai, Lao, Chinese etc. – and of course all those writing
>>> systems used before the invention of the space character :)
>>>
>>> regards,
>>> Kevin
>>>
>>> [1] https://wiki.apertium.org/wiki/LRLM
>>> [2] http://hdl.handle.net/10066/20002
>>> [3]
>>> https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to