Re: [Apertium-stuff] (GSoC) - Extracting knowledge from Apertium's post-edition logs to improve translations

José Emilio Muñoz Fri, 06 Apr 2012 04:53:47 -0700

Thank you very much. I have now submitted my application.


El 05/04/2012, a las 02:47, Luis Villarejo escribió:

> Hello José,
> 
> thank you for your interesting comments on the task.  Below you will find my 
> notes on your comments.
> 
> 2012/4/5 José Emilio Muñoz López <[email protected]>
> Hello,
> 
> I am really interested in joining Apertium to participate in the Google 
> Summer of Code. I am a 2nd year Electronics and Computer Science student at 
> the University of Edinburgh.
> 
> I find the task described at the title of this thread particularly appealing. 
> I have read its description on the GSoC ideas page, and I have some doubts 
> and thoughts I would like to discuss here. 
> 
> The information we want to extract from the log files, would it be used to 
> automatically generate new translation rules? or would it be presented 
> (perhaps in some summarised form, for example giving the most 
> frequent/significant changes) to someone in charge of the translation rules? 
> 
> We can think of both things. Dealing with rules is not always understandable 
> for most users, so a summarized form may be interesting. And once the rule is 
> validated, the automatic generation of the rule in the Apertium format would 
> be very interesting. Take into account that we may generate new rules that 
> could conflict with existing ones, we must also think on that.
>  
> 
> Would it be possible to use statistical tools on this information? For 
> example, we could calculate the probability of the correctness of a word in a 
> given context. If the users always change a given translated word when it 
> occurs in a certain context, then the engine could use this information to 
> improve future translations. In this way the generation of new rules could be 
> automated in some way. Is this a sensible idea?
> 
> I think we should rely on statistical thresholds to filter the information we 
> want to deal with.
>  
> 
> However this task seems to concentrate on the data mining rather than on its 
> potential uses, is this right?
> 
> the approach should be complete, we should think in both things.
>  
> in what ways should this information be presented? Would the graphical 
> environment (described in the task page) be similar to the way Google 
> translate works at the moment, where users get a drop down list of popular 
> alternatives when they click on a word? 
> 
> I am not really seeing this kind of interaction for this task. The 
> interaction you describe is the one that takes place when users use the AWI 
> and select the correct words for their translations. I think this task should 
> concentrate on extracting valuable information from the post-edition logs and 
> then explore the ways to trasnform it into valuable material to improve the 
> Apertium engine.
>  
> 
> I am not sure about this, but why does Apertium AWI not seem to support all 
> of Apertium language pairs?
> 
> It is just a matter of installing other pairs in the server.
>  
> 
> What programming languages would I need to help develop this task? What would 
> you recommend me to do in order to gain a better understanding of Apertium 
> and this task?
> 
> The task would require from knowledge of scripting language, a little bit of 
> XML to understand apertium rules and dictionaries and whatever would be 
> useful to build a small environment where the user could interact with the 
> extracted information and generate rules or dictionary entries.
> 
> Best,
> Luis
>  
> 
> Please ask me any information you may need. More details on my programming 
> knowledge and past experiences can be found in the thread "GSoC" posted by 
> Jacob Nordfalk on 2012-04-02 
> (https://sourceforge.net/mailarchive/forum.php?thread_name=CAKckPXZF0Bq_PWkk9s1rzFtZN%3DQ9s254fnQgWDgBwx35weU0kA%40mail.gmail.com&forum_name=apertium-stuff)
> 
> Thank you very much for your help,
> José Emilio Muñoz
> 
> 
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> 
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second 
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev_______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] (GSoC) - Extracting knowledge from Apertium's post-edition logs to improve translations

Reply via email to