El dl 15 de 04 de 2013 a les 13:57 +0200, en/na Tino Didriksen va
escriure:
> Registering my intention to apply to work on
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Monolingual_and_bilingual_data_decoupling
>
>
> The idea as it is now is very conservative. It basically boils down to
> changing the storage for monodixes. This is certainly doable in the
> timeframe...and possibly even trivial with SVN's file-level externals.
I'd prefer to avoid fiddling with SVN, it's a bit of a mess. I'm happy
to have copies.
My current preference would be for something like
apertium-{three letter code} for the monolingual data, which includes:
* A .dix file (or .lexc/.twol files)
* A .prob file (for the statistical tagger)
* A .rlx file (for the CG if any)
You can take a look at incubator/apertium-kaz for an idea.
> But, I've always wanted to prove that turning Apertium into a more
> classic analyse → translate pipe is not only doable, but should be
> preferred as it would eliminate a lot of arbitrary limitations in the
> current pipeline.
I think this would be an interesting task too. We could use either the
Romance languages, Slavic languages or the North Germanic languages as a
test bed.
Romance language pairs: ca-it, es-an, es-ast, es-gl, es-ca, es-it,
es-pt, es-ro, fr-ca, fr-es, oc-ca, oc-es, pt-ca, pt-gl
North Germanic language pairs: sv-da, nn-nb, nursery/no-en, is-en
Slavic language pairs: mk-bg, sh-mk, sh-sl
Fran
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff