Registering my intention to apply to work on
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Monolingual_and_bilingual_data_decoupling

The idea as it is now is very conservative. It basically boils down to
changing the storage for monodixes. This is certainly doable in the
timeframe...and possibly even trivial with SVN's file-level externals.

But, I've always wanted to prove that turning Apertium into a more classic
analyse → translate pipe is not only doable, but should be preferred as it
would eliminate a lot of arbitrary limitations in the current pipeline.

E.g., I find http://wiki.apertium.org/wiki/Why_we_trim a minor travesty.
Point 1 and 2 are screaming "*I'm a bug, fix me!*", but instead it was
propagated all the way back to through the chain. The fact that information
is directly lost before the final target sentence generation feels entirely
wrong to me. The chain should keep full source language information
throughout the pipe.

Point 3 outlines a proper problem with multi-word expressions, but I don't
think the problem is as dire as stated there. Noun phrases and compounds
tend to stay together; contractions should be split into multiple separate
tokens and handled as such. Are there any examples of compounds where the
parts need to be re-arranged in the target language?

See also http://wiki.apertium.org/wiki/Talk:Why_we_trim

Anyway, I propose solving the GSOC task as-is, and if there is time (which
I believe there will be), try to overhaul the core Apertium way of doing
things.

-- Tino Didriksen
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to