Dear Apertiumers,

As I said in my earlier message, I travelled to Olot to attend a Wikimedia prehackathon organized by Amical Wikimedia a Catalan pro-Wikimedia organization. As you know, Apertium is one of the MT systems used in their Content Translation  facility, which is used to translate Wikipedia articles.

Discussions there gave rise to two possible projects:

1. Using the database of post-edited machine translations in Content Translation. I have revived an old GSoC idea and I have added it to ("Improving language pairs mining Mediawiki Content Translation postedits"). I have also dug out some code I had and created a GitHub repository. I would love having feedback and additional mentors for this idea.

2. Mediawiki is creating a lexical database called Lexeme: . The data there is expected to have a CC0 license and will be wiki-edited. This means that we would be able to use their data to improve Apertium language pairs (which are GPL) (this means that Apertium data could not be uploaded to Lexeme unless all authors agreed in releasing it as CC0 too). People there may be open to patterning Lexeme entries so that they contain all the information needed in Apertium. In the horizon, the idea of using a wiki way to generate Apertium dictionaries is attractive. Not a mature GSoC idea yet, but I would like to see if there would be interest, and I would then turn it into a more detailed idea (after talking to people in Wikimedia Foundation such as Amir Aharoni and Laura Pintscher). There is already a ticket in their web:


Any ideas/comments welcome!




Mikel L. Forcada
Departament de Llenguatges i Sistemes InformĂ tics
Universitat d'Alacant
E-03690 Sant Vicent del Raspeig
Office: +34 96 590 9776

Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Apertium-stuff mailing list

Reply via email to