Hi Nariman,

The structure of the system is more or less the same across all pairs, but there are some components that we use in some and don't use in others. For example, the statistical system for choosing the correct rule to imply when there is ambiguity is a work in progress, and is only in a few pairs.


Your question regarding breaking some system by making changes is a valid one, but GSoC students don't typically make changes to programs we have in production. When a new component is written it is tested and introduced in a few pairs at first and so on.


There are a number of ways to increase the quality of a system but what is usually most urgent is things like expanding the dictionary and writing more transfer rules. Kazakh-Turkish would have been a nice domain for you to work on given your proficiency in both, but it has been getting quite a lot of attention recently and perhaps it would be better to choose some other Turkic pair (I've been thinking about Bashkurt-Turkish).


So to recap:


For improving/creating language pairs, the tools are already there and you will be making/improving things like a dictionary of words in both languages, rules to choose the right words, rules to reorder and change up the words so they make sense in the target language. This is something akin to developing language resources and doesn't require a whole lot of programming expertise, but some scripting is useful.


If you are a hardcore programmer, you can develop a new component or improve some features of the system.


I'm sure someone has sent you this link, but here is a list of ideas for projects we'd like to do this summer: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code


Best,

Memduh



On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote:

Hi Sevilay,

In my message, I meant that Kazakh and Turkish languages are similar in terms of affixes and sentence structure, and Kazakh and Russian are more different. So if I will increase the translation quality of the first pair, by adding some additional functionality to the pipeline, there is a chance that the same might not work on the second pair. Finally, the question is, Is this pipeline has to be the same for all language pairs, or it can differ?

------------------------------------------------------------------------
*From:* Sevilay Bayatlı <sevilaybaya...@gmail.com>
*Sent:* Thursday, March 14, 2019 1:13:18 PM
*To:* apertium-stuff@lists.sourceforge.net
*Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
Hi Daniyar,
,
Could tell us how can increase accuracy on one pair and decrease for other pair by modifying some parts of pipeline?

Sevilay


On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov <il...@selimcan.org <mailto:il...@selimcan.org>> wrote:




    -------- Forwarded Message --------
    Subject:        RBMT from Kazakh to Turkish
    Date:   Wed, 13 Mar 2019 19:07:42 +0000
    From:   Daniyar Nariman <n.dani...@innopolis.ru
    <mailto:n.dani...@innopolis.ru>>
    To: il...@selimcan.org <mailto:il...@selimcan.org>
    <il...@selimcan.org <mailto:il...@selimcan.org>>



    Dear Ilnar Salimzianov,


    My name is Nariman. I am a third-year bachelor student at
    Innopolis University(Russia, Tatarstan). I am studying Data
    Science and
    really interested in disciplines such as machine learning, natural
    language processing, information retrieval etc.


    Recently I read your paper, RBMT from Kazakh to Turkish, which was
    published in EAMT 2018. It was really interesting to read. The
    thing is,
    I am applying to GSoC(Google Summer of Code) this year to
    Apertium, but
    I am still thinking on the topic which I would like to deal with.
    One of
    the topics was to bring the defined language pair to state-of-the-art
    quality and I would like to deal with Kazakh-Turkish pair as the
    Kazakh language my mother tongue and I studied the Turkish language in
    the high school for 5 years.


    I would like to ask If there any restrictions on how to increase the
    quality of this pair?

    Excluding adding a large number of rules or by expanding the
    dictionary(taken for granted). For instance by optimizing the
    algorithms
    given in the pipeline. I am asking this question because by modifying
    some part of the pipeline, we can increase accuracy on our pair of
    languages, but decrease on another pair and constructing a different
    pipeline for different pairs is not a good idea in my opinion.



    Thanks in advance!


    Best Regards,

    Daniyar Nariman





    _______________________________________________
    Apertium-stuff mailing list
    Apertium-stuff@lists.sourceforge.net
    <mailto:Apertium-stuff@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/apertium-stuff





_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to