On 2019 ж. 15 наурыз 02:11:19 GMT+03:00, Jonathan Washington <jonathan.n.washing...@gmail.com> wrote: >Сәлем, Данияр! Қауымымызға қош келдіңіз! > >Thanks for getting in touch with Ilnar and the rest of the Apertium >community about your project idea. > >Memduh is right that Kazakh-to-Turkish MT is receiving a lot of >attention >right now in Apertium, and an additional project on it would likely >create >a bit of a mess. However, I think Turkish-to-Kazakh MT (i.e., the >other >direction) would be a good way for you to contribute, given your >linguistic >knowledge. The translation pair and language modules are the same, but >a >lot of the work would be editing a complementary set of files: >disambiguation for Turkish and not Kazakh, and lexical selection and >structural transfer for the Turkish-Kazakh direction instead of the >Kazakh-Turkish direction.
+1 >I don't see any problems with this, but perhaps others on this list >have >deeper insights. > >Another thought is that our Kazakh-to-Tatar MT system is one of our >oldest >"stable" Turkic pairs, but it does a poor job in the other direction. >Perhaps a coherent GSoC proposal could be assembled from making these >two >existing pairs (kaz-tat and kaz-tur) stable in the opposite directions. >I'd be interested to hear what other mentors think about this. >(Knowing >Kazakh and Turkish well should make Tatar fairly easy to work with.) > >Two additional little tidbits: > >Regarding your question about the pipeline involved, you can take a >look at >how the Apertium pipeline comes together here: >http://wiki.apertium.org/wiki/Apertium_system_architecture > >This page could be updated some, but is probably still helpful as is. > >Also, I see you managed to catch Ilnar on IRC. Feel free to stay >logged in >when you can—you'll find different people available at different times. > >Сөйлескенше, > >-- >Jonathan > > >чт, 14 мар. 2019 г. в 14:03, Memduh Gökırmak <memd...@gmail.com>: > >> Hi Nariman, >> >> >> The structure of the system is more or less the same across all >pairs, but >> there are some components that we use in some and don't use in >others. For >> example, the statistical system for choosing the correct rule to >imply when >> there is ambiguity is a work in progress, and is only in a few pairs. >> >> >> Your question regarding breaking some system by making changes is a >valid >> one, but GSoC students don't typically make changes to programs we >have in >> production. When a new component is written it is tested and >introduced in >> a few pairs at first and so on. >> >> >> There are a number of ways to increase the quality of a system but >what is >> usually most urgent is things like expanding the dictionary and >writing >> more transfer rules. Kazakh-Turkish would have been a nice domain for >you >> to work on given your proficiency in both, but it has been getting >quite a >> lot of attention recently and perhaps it would be better to choose >some >> other Turkic pair (I've been thinking about Bashkurt-Turkish). >> >> >> So to recap: >> >> >> For improving/creating language pairs, the tools are already there >and you >> will be making/improving things like a dictionary of words in both >> languages, rules to choose the right words, rules to reorder and >change up >> the words so they make sense in the target language. This is >something akin >> to developing language resources and doesn't require a whole lot of >> programming expertise, but some scripting is useful. >> >> >> If you are a hardcore programmer, you can develop a new component or >> improve some features of the system. >> >> >> I'm sure someone has sent you this link, but here is a list of ideas >for >> projects we'd like to do this summer: >> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code >> >> >> Best, >> >> Memduh >> >> >> >> On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote: >> >> Hi Sevilay, >> >> In my message, I meant that Kazakh and Turkish languages are similar >in >> terms of affixes and sentence structure, and Kazakh and Russian are >more >> different. So if I will increase the translation quality of the first >pair, >> by adding some additional functionality to the pipeline, there is a >chance >> that the same might not work on the second pair. Finally, the >question is, >> Is this pipeline has to be the same for all language pairs, or it can >> differ? >> ------------------------------ >> *From:* Sevilay Bayatlı <sevilaybaya...@gmail.com> >> <sevilaybaya...@gmail.com> >> *Sent:* Thursday, March 14, 2019 1:13:18 PM >> *To:* apertium-stuff@lists.sourceforge.net >> *Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish >> >> Hi Daniyar, >> , >> Could tell us how can increase accuracy on one pair and decrease for >other >> pair by modifying some parts of pipeline? >> >> Sevilay >> >> >> On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov ><il...@selimcan.org> >> wrote: >> >>> >>> >>> >>> -------- Forwarded Message -------- >>> Subject: RBMT from Kazakh to Turkish >>> Date: Wed, 13 Mar 2019 19:07:42 +0000 >>> From: Daniyar Nariman <n.dani...@innopolis.ru> >>> To: il...@selimcan.org <il...@selimcan.org> >>> >>> >>> >>> Dear Ilnar Salimzianov, >>> >>> >>> My name is Nariman. I am a third-year bachelor student at >>> Innopolis University(Russia, Tatarstan). I am studying Data Science >and >>> really interested in disciplines such as machine learning, natural >>> language processing, information retrieval etc. >>> >>> >>> Recently I read your paper, RBMT from Kazakh to Turkish, which was >>> published in EAMT 2018. It was really interesting to read. The thing >is, >>> I am applying to GSoC(Google Summer of Code) this year to Apertium, >but >>> I am still thinking on the topic which I would like to deal with. >One of >>> the topics was to bring the defined language pair to >state-of-the-art >>> quality and I would like to deal with Kazakh-Turkish pair as the >>> Kazakh language my mother tongue and I studied the Turkish language >in >>> the high school for 5 years. >>> >>> >>> I would like to ask If there any restrictions on how to increase the >>> quality of this pair? >>> >>> Excluding adding a large number of rules or by expanding the >>> dictionary(taken for granted). For instance by optimizing the >algorithms >>> given in the pipeline. I am asking this question because by >modifying >>> some part of the pipeline, we can increase accuracy on our pair of >>> languages, but decrease on another pair and constructing a different >>> pipeline for different pairs is not a good idea in my opinion. >>> >>> >>> >>> Thanks in advance! >>> >>> >>> Best Regards, >>> >>> Daniyar Nariman >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> >> >> >> >> _______________________________________________ >> Apertium-stuff mailing >listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> -- Простите за краткость, создано в K-9 Mail. _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff