Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Memduh Gökırmak Thu, 14 Mar 2019 11:03:47 -0700

Hi Nariman,

The structure of the system is more or less the same across all pairs,but there are some components that we use in some and don't use inothers. For example, the statistical system for choosing the correctrule to imply when there is ambiguity is a work in progress, and is onlyin a few pairs.

Your question regarding breaking some system by making changes is avalid one, but GSoC students don't typically make changes to programs wehave in production. When a new component is written it is tested andintroduced in a few pairs at first and so on.

There are a number of ways to increase the quality of a system but whatis usually most urgent is things like expanding the dictionary andwriting more transfer rules. Kazakh-Turkish would have been a nicedomain for you to work on given your proficiency in both, but it hasbeen getting quite a lot of attention recently and perhaps it would bebetter to choose some other Turkic pair (I've been thinking aboutBashkurt-Turkish).



So to recap:

For improving/creating language pairs, the tools are already there andyou will be making/improving things like a dictionary of words in bothlanguages, rules to choose the right words, rules to reorder and changeup the words so they make sense in the target language. This issomething akin to developing language resources and doesn't require awhole lot of programming expertise, but some scripting is useful.

If you are a hardcore programmer, you can develop a new component orimprove some features of the system.

I'm sure someone has sent you this link, but here is a list of ideas forprojects we'd like to do this summer:http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code



Best,

Memduh



On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote:


Hi Sevilay,

In my message, I meant that Kazakh and Turkish languages are similarin terms of affixes and sentence structure, and Kazakh and Russian aremore different. So if I will increase the translation quality of thefirst pair, by adding some additional functionality to the pipeline,there is a chance that the same might not work on the second pair.Finally, the question is, Is this pipeline has to be the same for alllanguage pairs, or it can differ?


------------------------------------------------------------------------
*From:* Sevilay Bayatlı <sevilaybaya...@gmail.com>
*Sent:* Thursday, March 14, 2019 1:13:18 PM
*To:* apertium-stuff@lists.sourceforge.net
*Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
Hi Daniyar,
,

Could tell us how can increase accuracy on one pair and decrease forother pair by modifying some parts of pipeline?


Sevilay

On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov <il...@selimcan.org<mailto:il...@selimcan.org>> wrote:





    -------- Forwarded Message --------
    Subject:        RBMT from Kazakh to Turkish
    Date:   Wed, 13 Mar 2019 19:07:42 +0000
    From:   Daniyar Nariman <n.dani...@innopolis.ru
    <mailto:n.dani...@innopolis.ru>>
    To: il...@selimcan.org <mailto:il...@selimcan.org>
    <il...@selimcan.org <mailto:il...@selimcan.org>>



    Dear Ilnar Salimzianov,


    My name is Nariman. I am a third-year bachelor student at
    Innopolis University(Russia, Tatarstan). I am studying Data
    Science and
    really interested in disciplines such as machine learning, natural
    language processing, information retrieval etc.


    Recently I read your paper, RBMT from Kazakh to Turkish, which was
    published in EAMT 2018. It was really interesting to read. The
    thing is,
    I am applying to GSoC(Google Summer of Code) this year to
    Apertium, but
    I am still thinking on the topic which I would like to deal with.
    One of
    the topics was to bring the defined language pair to state-of-the-art
    quality and I would like to deal with Kazakh-Turkish pair as the
    Kazakh language my mother tongue and I studied the Turkish language in
    the high school for 5 years.


    I would like to ask If there any restrictions on how to increase the
    quality of this pair?

    Excluding adding a large number of rules or by expanding the
    dictionary(taken for granted). For instance by optimizing the
    algorithms
    given in the pipeline. I am asking this question because by modifying
    some part of the pipeline, we can increase accuracy on our pair of
    languages, but decrease on another pair and constructing a different
    pipeline for different pairs is not a good idea in my opinion.



    Thanks in advance!


    Best Regards,

    Daniyar Nariman





    _______________________________________________
    Apertium-stuff mailing list
    Apertium-stuff@lists.sourceforge.net
    <mailto:Apertium-stuff@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/apertium-stuff





_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Reply via email to