Re: [Apertium-stuff] MT and Wikipedia

Jimmy O'Regan Wed, 01 May 2013 10:21:10 -0700

On 1 May 2013 16:59, David Cuenca <[email protected]> wrote:
> Dear all,
>
> Erik Möller, head of Engineering and Product Development in the Wikimedia
> Foundation, started a thread on the Wikimedia mailing list about the
> convenience or not of supporting open source machine translation. Original
> thread:
> http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125350.html
>
> I suggested using software like Omegawiki or Wikidata as a frontend for
> building grammar and language pair files that software like Apertium uses:
> http://lists.wikimedia.org/pipermail/wikimedia-l/2013-April/125642.html
>


I guess the good news is that it's *already* feasible for us to build
translators using Wikipedia... we do it all the time :) See, for
example, the case of Spanish-Aragonese
(http://www.lrec-conf.org/proceedings/lrec2012/pdf/326_Paper.pdf).

We have a tool for extracting dictionaries from OmegaWiki, but it goes
unused because of licence incompatibilities. We could wait and see if
CC-BY-SA 4 adds the GPL as a compatible licence, but it might be
better all round if we were to switch to CC-BY-SA for the dictionaries
- the GPL is not a particularly suitable licence for dictionaries, and
in particular has no waiver of database rights which could be used (in
Europe, at least) to make modified dictionaries proprietary.

But that's beside the point. Wikidata looks promising to me; the last
time I had considered returning to education, I was going to propose
as my project 'macro domain machine translation', which would have
involved extracting translation rules specific to infobox properties
(the simplest example would be for eye colour, where the translation
ought to be in the plural, rather than the singular, translating from
English), and changing Apertium to accept two sets of rules, unifying
pattern matching (similarly to analysis), and choosing the second set
of rules in the event of conflict. Wikidata looks set to provide more,
and cleaner, data for such a task.

What would be really excellent would be Wikidata integration into
Wiktionary. I've been tinkering with DBpedia's Wiktionary extraction
for a while now, and the data extracted is still quite noisy. It would
be great if it wasn't necessary.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] MT and Wikipedia

Reply via email to