Re: [Apertium-stuff] Data Decoupling for GSOC 2013

Francis Tyers Tue, 23 Apr 2013 05:50:45 -0700

El dl 15 de 04 de 2013 a les 13:57 +0200, en/na Tino Didriksen va
escriure:
> Registering my intention to apply to work on
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Monolingual_and_bilingual_data_decoupling
> 
> 
> The idea as it is now is very conservative. It basically boils down to
> changing the storage for monodixes. This is certainly doable in the
> timeframe...and possibly even trivial with SVN's file-level externals.


I'd prefer to avoid fiddling with SVN, it's a bit of a mess. I'm happy
to have copies. 

My current preference would be for something like

apertium-{three letter code} for the monolingual data, which includes: 

* A .dix file (or .lexc/.twol files)
* A .prob file (for the statistical tagger)
* A .rlx file (for the CG if any)

You can take a look at incubator/apertium-kaz for an idea. 

> But, I've always wanted to prove that turning Apertium into a more
> classic analyse → translate pipe is not only doable, but should be
> preferred as it would eliminate a lot of arbitrary limitations in the
> current pipeline.

I think this would be an interesting task too. We could use either the
Romance languages, Slavic languages or the North Germanic languages as a
test bed.

Romance language pairs: ca-it, es-an, es-ast, es-gl, es-ca, es-it,
es-pt, es-ro, fr-ca, fr-es, oc-ca, oc-es, pt-ca, pt-gl

North Germanic language pairs: sv-da, nn-nb, nursery/no-en, is-en

Slavic language pairs: mk-bg, sh-mk, sh-sl

Fran


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Data Decoupling for GSOC 2013

Reply via email to