Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Ilnar Salimzianov Thu, 14 Mar 2019 22:57:01 -0700


On 2019 ж. 15 наурыз 02:11:19 GMT+03:00, Jonathan Washington 
<[email protected]> wrote:
>Сәлем, Данияр!  Қауымымызға қош келдіңіз!
>
>Thanks for getting in touch with Ilnar and the rest of the Apertium
>community about your project idea.
>
>Memduh is right that Kazakh-to-Turkish MT is receiving a lot of
>attention
>right now in Apertium, and an additional project on it would likely
>create
>a bit of a mess.  However, I think Turkish-to-Kazakh MT (i.e., the
>other
>direction) would be a good way for you to contribute, given your
>linguistic
>knowledge.  The translation pair and language modules are the same, but
>a
>lot of the work would be editing a complementary set of files:
>disambiguation for Turkish and not Kazakh, and lexical selection and
>structural transfer for the Turkish-Kazakh direction instead of the
>Kazakh-Turkish direction.


+1

>I don't see any problems with this, but perhaps others on this list
>have
>deeper insights.
>
>Another thought is that our Kazakh-to-Tatar MT system is one of our
>oldest
>"stable" Turkic pairs, but it does a poor job in the other direction.
>Perhaps a coherent GSoC proposal could be assembled from making these
>two
>existing pairs (kaz-tat and kaz-tur) stable in the opposite directions.
>I'd be interested to hear what other mentors think about this. 
>(Knowing
>Kazakh and Turkish well should make Tatar fairly easy to work with.)
>
>Two additional little tidbits:
>
>Regarding your question about the pipeline involved, you can take a
>look at
>how the Apertium pipeline comes together here:
>http://wiki.apertium.org/wiki/Apertium_system_architecture
>
>This page could be updated some, but is probably still helpful as is.
>
>Also, I see you managed to catch Ilnar on IRC.  Feel free to stay
>logged in
>when you can—you'll find different people available at different times.
>
>Сөйлескенше,
>
>--
>Jonathan
>
>
>чт, 14 мар. 2019 г. в 14:03, Memduh Gökırmak <[email protected]>:
>
>> Hi Nariman,
>>
>>
>> The structure of the system is more or less the same across all
>pairs, but
>> there are some components that we use in some and don't use in
>others. For
>> example, the statistical system for choosing the correct rule to
>imply when
>> there is ambiguity is a work in progress, and is only in a few pairs.
>>
>>
>> Your question regarding breaking some system by making changes is a
>valid
>> one, but GSoC students don't typically make changes to programs we
>have in
>> production. When a new component is written it is tested and
>introduced in
>> a few pairs at first and so on.
>>
>>
>> There are a number of ways to increase the quality of a system but
>what is
>> usually most urgent is things like expanding the dictionary and
>writing
>> more transfer rules. Kazakh-Turkish would have been a nice domain for
>you
>> to work on given your proficiency in both, but it has been getting
>quite a
>> lot of attention recently and perhaps it would be better to choose
>some
>> other Turkic pair (I've been thinking about Bashkurt-Turkish).
>>
>>
>> So to recap:
>>
>>
>> For improving/creating language pairs, the tools are already there
>and you
>> will be making/improving things like a dictionary of words in both
>> languages, rules to choose the right words, rules to reorder and
>change up
>> the words so they make sense in the target language. This is
>something akin
>> to developing language resources and doesn't require a whole lot of
>> programming expertise, but some scripting is useful.
>>
>>
>> If you are a hardcore programmer, you can develop a new component or
>> improve some features of the system.
>>
>>
>> I'm sure someone has sent you this link, but here is a list of ideas
>for
>> projects we'd like to do this summer:
>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
>>
>>
>> Best,
>>
>> Memduh
>>
>>
>>
>> On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote:
>>
>> Hi Sevilay,
>>
>> In my message, I meant that Kazakh and Turkish languages are similar
>in
>> terms of affixes and sentence structure, and Kazakh and Russian are
>more
>> different. So if I will increase the translation quality of the first
>pair,
>> by adding some additional functionality to the pipeline, there is a
>chance
>> that the same might not work on the second pair. Finally, the
>question is,
>> Is this pipeline has to be the same for all language pairs, or it can
>> differ?
>> ------------------------------
>> *From:* Sevilay Bayatlı <[email protected]>
>> <[email protected]>
>> *Sent:* Thursday, March 14, 2019 1:13:18 PM
>> *To:* [email protected]
>> *Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
>>
>> Hi Daniyar,
>> ,
>> Could tell us how can increase accuracy on one pair and decrease for
>other
>> pair by modifying some parts of pipeline?
>>
>> Sevilay
>>
>>
>> On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov
><[email protected]>
>> wrote:
>>
>>>
>>>
>>>
>>> -------- Forwarded Message --------
>>> Subject:        RBMT from Kazakh to Turkish
>>> Date:   Wed, 13 Mar 2019 19:07:42 +0000
>>> From:   Daniyar Nariman <[email protected]>
>>> To:     [email protected] <[email protected]>
>>>
>>>
>>>
>>> Dear Ilnar Salimzianov,
>>>
>>>
>>> My name is Nariman. I am a third-year bachelor student at
>>> Innopolis University(Russia, Tatarstan). I am studying Data Science
>and
>>> really interested in disciplines such as machine learning, natural
>>> language processing, information retrieval etc.
>>>
>>>
>>> Recently I read your paper, RBMT from Kazakh to Turkish, which was
>>> published in EAMT 2018. It was really interesting to read. The thing
>is,
>>> I am applying to GSoC(Google Summer of Code) this year to Apertium,
>but
>>> I am still thinking on the topic which I would like to deal with.
>One of
>>> the topics was to bring the defined language pair to
>state-of-the-art
>>> quality and I would like to deal with Kazakh-Turkish pair as the
>>> Kazakh language my mother tongue and I studied the Turkish language
>in
>>> the high school for 5 years.
>>>
>>>
>>> I would like to ask If there any restrictions on how to increase the
>>> quality of this pair?
>>>
>>> Excluding adding a large number of rules or by expanding the
>>> dictionary(taken for granted). For instance by optimizing the
>algorithms
>>> given in the pipeline. I am asking this question because by
>modifying
>>> some part of the pipeline, we can increase accuracy on our pair of
>>> languages, but decrease on another pair and constructing a different
>>> pipeline for different pairs is not a good idea in my opinion.
>>>
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>> Best Regards,
>>>
>>> Daniyar Nariman
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing
>[email protected]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>

-- 
Простите за краткость, создано в K-9 Mail.


_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

Reply via email to