Hello José,

thank you for your interesting comments on the task.  Below you will find
my notes on your comments.

2012/4/5 José Emilio Muñoz López <[email protected]>

> Hello,
>
> I am really interested in joining Apertium to participate in the Google
> Summer of Code. I am a 2nd year Electronics and Computer Science student at
> the University of Edinburgh.
>
> I find the task described at the title of this thread particularly
> appealing. I have read its description on the GSoC ideas page, and I have
> some doubts and thoughts I would like to discuss here.
>
> The information we want to extract from the log files, would it be used to
> automatically generate new translation rules? or would it be presented
> (perhaps in some summarised form, for example giving the most
> frequent/significant changes) to someone in charge of the translation
> rules?
>

We can think of both things. Dealing with rules is not always
understandable for most users, so a summarized form may be interesting. And
once the rule is validated, the automatic generation of the rule in the
Apertium format would be very interesting. Take into account that we may
generate new rules that could conflict with existing ones, we must also
think on that.


>
> Would it be possible to use statistical tools on this information? For
> example, we could calculate the probability of the correctness of a word in
> a given context. If the users always change a given translated word when it
> occurs in a certain context, then the engine could use this information to
> improve future translations. In this way the generation of new rules could
> be automated in some way. Is this a sensible idea?
>

I think we should rely on statistical thresholds to filter the information
we want to deal with.


>
> However this task seems to concentrate on the data mining rather than on
> its potential uses, is this right?
>

the approach should be complete, we should think in both things.


> in what ways should this information be presented? Would the graphical
> environment (described in the task page) be similar to the way Google
> translate works at the moment, where users get a drop down list of popular
> alternatives when they click on a word?
>

I am not really seeing this kind of interaction for this task. The
interaction you describe is the one that takes place when users use the AWI
and select the correct words for their translations. I think this task
should concentrate on extracting valuable information from the post-edition
logs and then explore the ways to trasnform it into valuable material to
improve the Apertium engine.


>
> I am not sure about this, but why does Apertium AWI not seem to support
> all of Apertium language pairs?
>

It is just a matter of installing other pairs in the server.


>
> What programming languages would I need to help develop this task? What
> would you recommend me to do in order to gain a better understanding of
> Apertium and this task?
>

The task would require from knowledge of scripting language, a little bit
of XML to understand apertium rules and dictionaries and whatever would be
useful to build a small environment where the user could interact with the
extracted information and generate rules or dictionary entries.

Best,
Luis


>
> Please ask me any information you may need. More details on my programming
> knowledge and past experiences can be found in the thread "GSoC" posted by
> Jacob Nordfalk on 2012-04-02 (
> https://sourceforge.net/mailarchive/forum.php?thread_name=CAKckPXZF0Bq_PWkk9s1rzFtZN%3DQ9s254fnQgWDgBwx35weU0kA%40mail.gmail.com&forum_name=apertium-stuff
> )
>
> Thank you very much for your help,
> José Emilio Muñoz
>
>
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to