[Apertium-stuff] (GSoC) - Extracting knowledge from Apertium's post-edition logs to improve translations

José Emilio Muñoz López Wed, 04 Apr 2012 17:14:23 -0700

Hello,

I am really interested in joining Apertium to participate in the Google Summer 
of Code. I am a 2nd year Electronics and Computer Science student at the 
University of Edinburgh.

I find the task described at the title of this thread particularly appealing. I
have read its description on the GSoC ideas page, and I have some doubts and
thoughts I would like to discuss here.

The information we want to extract from the log files, would it be used to
automatically generate new translation rules? or would it be presented (perhaps
in some summarised form, for example giving the most frequent/significant
changes) to someone in charge of the translation rules?

Would it be possible to use statistical tools on this information? For example,
we could calculate the probability of the correctness of a word in a given
context. If the users always change a given translated word when it occurs in a
certain context, then the engine could use this information to improve future
translations. In this way the generation of new rules could be automated in
some way. Is this a sensible idea?

However this task seems to concentrate on the data mining rather than on its
potential uses, is this right? in what ways should this information be
presented? Would the graphical environment (described in the task page) be
similar to the way Google translate works at the moment, where users get a drop
down list of popular alternatives when they click on a word?

I am not sure about this, but why does Apertium AWI not seem to support all of
Apertium language pairs?

What programming languages would I need to help develop this task? What would
you recommend me to do in order to gain a better understanding of Apertium and
this task?

Please ask me any information you may need. More details on my programming
knowledge and past experiences can be found in the thread "GSoC" posted by
Jacob Nordfalk on 2012-04-02
(https://sourceforge.net/mailarchive/forum.php?thread_name=CAKckPXZF0Bq_PWkk9s1rzFtZN%3DQ9s254fnQgWDgBwx35weU0kA%40mail.gmail.com&forum_name=apertium-stuff)

Thank you very much for your help,
José Emilio Muñoz

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] (GSoC) - Extracting knowledge from Apertium's post-edition logs to improve translations

Reply via email to