Hi Nariman,
The structure of the system is more or less the same across all pairs,
but there are some components that we use in some and don't use in
others. For example, the statistical system for choosing the correct
rule to imply when there is ambiguity is a work in progress, and is only
in a few pairs.
Your question regarding breaking some system by making changes is a
valid one, but GSoC students don't typically make changes to programs we
have in production. When a new component is written it is tested and
introduced in a few pairs at first and so on.
There are a number of ways to increase the quality of a system but what
is usually most urgent is things like expanding the dictionary and
writing more transfer rules. Kazakh-Turkish would have been a nice
domain for you to work on given your proficiency in both, but it has
been getting quite a lot of attention recently and perhaps it would be
better to choose some other Turkic pair (I've been thinking about
Bashkurt-Turkish).
So to recap:
For improving/creating language pairs, the tools are already there and
you will be making/improving things like a dictionary of words in both
languages, rules to choose the right words, rules to reorder and change
up the words so they make sense in the target language. This is
something akin to developing language resources and doesn't require a
whole lot of programming expertise, but some scripting is useful.
If you are a hardcore programmer, you can develop a new component or
improve some features of the system.
I'm sure someone has sent you this link, but here is a list of ideas for
projects we'd like to do this summer:
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
Best,
Memduh
On 14-03-2019 15:26, Daniyar Nariman via Apertium-stuff wrote:
Hi Sevilay,
In my message, I meant that Kazakh and Turkish languages are similar
in terms of affixes and sentence structure, and Kazakh and Russian are
more different. So if I will increase the translation quality of the
first pair, by adding some additional functionality to the pipeline,
there is a chance that the same might not work on the second pair.
Finally, the question is, Is this pipeline has to be the same for all
language pairs, or it can differ?
------------------------------------------------------------------------
*From:* Sevilay Bayatlı <sevilaybaya...@gmail.com>
*Sent:* Thursday, March 14, 2019 1:13:18 PM
*To:* apertium-stuff@lists.sourceforge.net
*Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
Hi Daniyar,
,
Could tell us how can increase accuracy on one pair and decrease for
other pair by modifying some parts of pipeline?
Sevilay
On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov <il...@selimcan.org
<mailto:il...@selimcan.org>> wrote:
-------- Forwarded Message --------
Subject: RBMT from Kazakh to Turkish
Date: Wed, 13 Mar 2019 19:07:42 +0000
From: Daniyar Nariman <n.dani...@innopolis.ru
<mailto:n.dani...@innopolis.ru>>
To: il...@selimcan.org <mailto:il...@selimcan.org>
<il...@selimcan.org <mailto:il...@selimcan.org>>
Dear Ilnar Salimzianov,
My name is Nariman. I am a third-year bachelor student at
Innopolis University(Russia, Tatarstan). I am studying Data
Science and
really interested in disciplines such as machine learning, natural
language processing, information retrieval etc.
Recently I read your paper, RBMT from Kazakh to Turkish, which was
published in EAMT 2018. It was really interesting to read. The
thing is,
I am applying to GSoC(Google Summer of Code) this year to
Apertium, but
I am still thinking on the topic which I would like to deal with.
One of
the topics was to bring the defined language pair to state-of-the-art
quality and I would like to deal with Kazakh-Turkish pair as the
Kazakh language my mother tongue and I studied the Turkish language in
the high school for 5 years.
I would like to ask If there any restrictions on how to increase the
quality of this pair?
Excluding adding a large number of rules or by expanding the
dictionary(taken for granted). For instance by optimizing the
algorithms
given in the pipeline. I am asking this question because by modifying
some part of the pipeline, we can increase accuracy on our pair of
languages, but decrease on another pair and constructing a different
pipeline for different pairs is not a good idea in my opinion.
Thanks in advance!
Best Regards,
Daniyar Nariman
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
<mailto:Apertium-stuff@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff