Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Jonathan Washington Thu, 14 Mar 2019 16:12:12 -0700

Сәлем, Данияр!  Қауымымызға қош келдіңіз!

Thanks for getting in touch with Ilnar and the rest of the Apertium
community about your project idea.


Memduh is right that Kazakh-to-Turkish MT is receiving a lot of attention
right now in Apertium, and an additional project on it would likely create
a bit of a mess.  However, I think Turkish-to-Kazakh MT (i.e., the other
direction) would be a good way for you to contribute, given your linguistic
knowledge.  The translation pair and language modules are the same, but a
lot of the work would be editing a complementary set of files:
disambiguation for Turkish and not Kazakh, and lexical selection and
structural transfer for the Turkish-Kazakh direction instead of the
Kazakh-Turkish direction.

I don't see any problems with this, but perhaps others on this list have
deeper insights.

Another thought is that our Kazakh-to-Tatar MT system is one of our oldest
"stable" Turkic pairs, but it does a poor job in the other direction.
Perhaps a coherent GSoC proposal could be assembled from making these two
existing pairs (kaz-tat and kaz-tur) stable in the opposite directions.
I'd be interested to hear what other mentors think about this.  (Knowing
Kazakh and Turkish well should make Tatar fairly easy to work with.)

Two additional little tidbits:

Regarding your question about the pipeline involved, you can take a look at
how the Apertium pipeline comes together here:
http://wiki.apertium.org/wiki/Apertium_system_architecture

This page could be updated some, but is probably still helpful as is.

Also, I see you managed to catch Ilnar on IRC.  Feel free to stay logged in
when you can—you'll find different people available at different times.

Сөйлескенше,

--
Jonathan


чт, 14 мар. 2019 г. в 14:03, Memduh Gökırmak <[email protected]>:

> Hi Nariman,
>
>
> The structure of the system is more or less the same across all pairs, but
> there are some components that we use in some and don't use in others. For
> example, the statistical system for choosing the correct rule to imply when
> there is ambiguity is a work in progress, and is only in a few pairs.
>
>
> Your question regarding breaking some system by making changes is a valid
> one, but GSoC students don't typically make changes to programs we have in
> production. When a new component is written it is tested and introduced in
> a few pairs at first and so on.
>
>
> There are a number of ways to increase the quality of a system but what is
> usually most urgent is things like expanding the dictionary and writing
> more transfer rules. Kazakh-Turkish would have been a nice domain for you
> to work on given your proficiency in both, but it has been getting quite a
> lot of attention recently and perhaps it would be better to choose some
> other Turkic pair (I've been thinking about Bashkurt-Turkish).
>
>
> So to recap:
>
>
> For improving/creating language pairs, the tools are already there and you
> will be making/improving things like a dictionary of words in both
> languages, rules to choose the right words, rules to reorder and change up
> the words so they make sense in the target language. This is something akin
> to developing language resources and doesn't require a whole lot of
> programming expertise, but some scripting is useful.
>
>
> If you are a hardcore programmer, you can develop a new component or
> improve some features of the system.
>
>
> I'm sure someone has sent you this link, but here is a list of ideas for
> projects we'd like to do this summer:
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
>
>
> Best,
>
> Memduh
>
>
>
> On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote:
>
> Hi Sevilay,
>
> In my message, I meant that Kazakh and Turkish languages are similar in
> terms of affixes and sentence structure, and Kazakh and Russian are more
> different. So if I will increase the translation quality of the first pair,
> by adding some additional functionality to the pipeline, there is a chance
> that the same might not work on the second pair. Finally, the question is,
> Is this pipeline has to be the same for all language pairs, or it can
> differ?
> ------------------------------
> *From:* Sevilay Bayatlı <[email protected]>
> <[email protected]>
> *Sent:* Thursday, March 14, 2019 1:13:18 PM
> *To:* [email protected]
> *Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
>
> Hi Daniyar,
> ,
> Could tell us how can increase accuracy on one pair and decrease for other
> pair by modifying some parts of pipeline?
>
> Sevilay
>
>
> On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov <[email protected]>
> wrote:
>
>>
>>
>>
>> -------- Forwarded Message --------
>> Subject:        RBMT from Kazakh to Turkish
>> Date:   Wed, 13 Mar 2019 19:07:42 +0000
>> From:   Daniyar Nariman <[email protected]>
>> To:     [email protected] <[email protected]>
>>
>>
>>
>> Dear Ilnar Salimzianov,
>>
>>
>> My name is Nariman. I am a third-year bachelor student at
>> Innopolis University(Russia, Tatarstan). I am studying Data Science and
>> really interested in disciplines such as machine learning, natural
>> language processing, information retrieval etc.
>>
>>
>> Recently I read your paper, RBMT from Kazakh to Turkish, which was
>> published in EAMT 2018. It was really interesting to read. The thing is,
>> I am applying to GSoC(Google Summer of Code) this year to Apertium, but
>> I am still thinking on the topic which I would like to deal with. One of
>> the topics was to bring the defined language pair to state-of-the-art
>> quality and I would like to deal with Kazakh-Turkish pair as the
>> Kazakh language my mother tongue and I studied the Turkish language in
>> the high school for 5 years.
>>
>>
>> I would like to ask If there any restrictions on how to increase the
>> quality of this pair?
>>
>> Excluding adding a large number of rules or by expanding the
>> dictionary(taken for granted). For instance by optimizing the algorithms
>> given in the pipeline. I am asking this question because by modifying
>> some part of the pipeline, we can increase accuracy on our pair of
>> languages, but decrease on another pair and constructing a different
>> pipeline for different pairs is not a good idea in my opinion.
>>
>>
>>
>> Thanks in advance!
>>
>>
>> Best Regards,
>>
>> Daniyar Nariman
>>
>>
>>
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
>
>
> _______________________________________________
> Apertium-stuff mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Reply via email to