On 1 May 2013 16:59, David Cuenca <[email protected]> wrote: > Dear all, > > Erik Möller, head of Engineering and Product Development in the Wikimedia > Foundation, started a thread on the Wikimedia mailing list about the > convenience or not of supporting open source machine translation. Original > thread: > http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125350.html > > I suggested using software like Omegawiki or Wikidata as a frontend for > building grammar and language pair files that software like Apertium uses: > http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125642.html >
I guess the good news is that it's *already* feasible for us to build translators using Wikipedia... we do it all the time :) See, for example, the case of Spanish-Aragonese (http://www.lrec-conf.org/proceedings/lrec2012/pdf/326_Paper.pdf). We have a tool for extracting dictionaries from OmegaWiki, but it goes unused because of licence incompatibilities. We could wait and see if CC-BY-SA 4 adds the GPL as a compatible licence, but it might be better all round if we were to switch to CC-BY-SA for the dictionaries - the GPL is not a particularly suitable licence for dictionaries, and in particular has no waiver of database rights which could be used (in Europe, at least) to make modified dictionaries proprietary. But that's beside the point. Wikidata looks promising to me; the last time I had considered returning to education, I was going to propose as my project 'macro domain machine translation', which would have involved extracting translation rules specific to infobox properties (the simplest example would be for eye colour, where the translation ought to be in the plural, rather than the singular, translating from English), and changing Apertium to accept two sets of rules, unifying pattern matching (similarly to analysis), and choosing the second set of rules in the event of conflict. Wikidata looks set to provide more, and cleaner, data for such a task. What would be really excellent would be Wikidata integration into Wiktionary. I've been tinkering with DBpedia's Wiktionary extraction for a while now, and the data extracted is still quite noisy. It would be great if it wasn't necessary. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with <2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
