On 7 March 2011 16:28, Antonio Toral <[email protected]> wrote:
> Hi,
>
> I'd like to add this idea:
>
>
> task: dictionary induction from wikis
> difficulty: 3. medium
> description: Extract dictionaries from linguistic wikis
> rationale: Wiki dictionaries and encyclopedias (e.g. omegawiki,
> wiktionary, wikipedia) contain information (e.g. bilingual equivalences,
> morphological features, conjugations, etc) that could be exploited to
> speed up the development of dictionaries for Apertium. This task aims at
> automatically building dictionaries by extracting different pieces of
> information from wiki structures such as interlingual links, infoboxes,
> etc.
> requirements: SQL, mediawiki syntax, perl, maybe C++ or Java
>

FWIW, there's a branch of dbpedia that has started to do something
similar: 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/e406efd61660

At the moment, it only targets de.wiktionary, but (where possible)
it's being designed to have templates independent of the code, so it
should be about as easy to adapt to new wiktionaries as dbpedia proper
is to new wikipedias (i.e., there are (unavoidably) some things that
need to be added to the code, but most information comes from the
templates). -- mostly Scala, some Java

There's also a Freedict-related project to extract TEI dictionaries
from the Russian wiktionary:
http://wiktionary-export.nataraj.su/en/about.html -- Perl

There's also a Java-based parser for wikimedia-style wikis:
http://code.google.com/p/jwpl/

>
> Would anyone else be interested as mentor?

How about you? :) If someone's interested in working on the dbpedia
framework, I've done some work on it, and would be happy to mentor
that.


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to