Alex Aruj <[email protected]> writes: > Hello, I was still unable to see the updates to dictionaries taking > full effect even after trying the -d . es-en solution, but I will try > running lt-comp again, checking the lr and rl directionality and > automorf and autogen bin files.
en-es has a complete Makefile.am, so running ./autogen.sh at least once and then make after each change to .dix files should be enough. That is, you don't need to manually call lt-comp with en-es, the make system does that for you. > I have shared part of the GSOC proposal that I think is most directly > relevant to the task. I would like some feedback on it if anyone has > time. If any ideas about the project are misguided, please suggest > alternatives. The formatting options are a little wacky on Windows 8 > MSWord--will certainly adjust later. > > https://drive.google.com/file/d/0By8YUPGatqZZb2NxV0NCUHdWTlU/edit? "accuracy of grammar, not just in terms of vocabulary matching, which I will already attempt to increase by 15-20%." What does this mean? Increasing vocabulary by 15-20% in itself could be totally useless if that increase is in infrequent forms. What we want is to increase the _coverage_ on real text. The naive method is to run a large text corpus through the translator and count the *'s compared to words that get a translation. Since your task is to make the pair state of the art, you should also look at the true coverage: if one form gets an analysis as adjective, it might still not be covered if that form can also be an adverb. (This requires more manual work than naive coverage.) "• Write script that allows quick recompilation of dix and bins, thus avoiding user to input all the lt-comp calls to update the dix." already done, see above ;) Regarding a "Quick Vocab Add" program, that's either a very simple script, or a huge project. We've had earlier GsoC projects on this[1], making it work in general is a huge task. Making helper scripts and such for quickly adding words is a good idea, you should certainly do it, but it's not really a deliverable. [1] http://wiki.apertium.org/wiki/Easy_dictionary_maintenance -- Kevin Brubeck Unhammer GPG: 0x766AC60C
pgpkc1AliCUOM.pgp
Description: PGP signature
------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
