El dt 18 de 03 de 2014 a les 01:08 -0700, en/na Alex Aruj va escriure: > Hello, I was still unable to see the updates to dictionaries taking > full effect even after trying the -d . es-en solution, but I will try > running lt-comp again, checking the lr and rl directionality and > automorf and autogen bin files. > > > I have shared part of the GSOC proposal that I think is most directly > relevant to the task. I would like some feedback on it if anyone has > time. If any ideas about the project are misguided, please suggest > alternatives. The formatting options are a little wacky on Windows 8 > MSWord--will certainly adjust later.
Comments: I think it might be more convincing if you showed the existing coverage on a range of corpora, and showed estimates of how many words you would have to add in order to reach the targets you've given yourself. I would like to see a week-by-week plan. Procedure: 1) Calculate coverage over the whole corpus. 2) Get number of known tokens/total tokens. 3) Find out how many more tokens you need to add in order to increase 1% 4) Make a frequency list of unknown words 5) Starting at the top of the list, count down number of words and token count. This way you should be able to find how many tokens (surface forms) you need to over to increase by 1%. You seem to be confusing error rate with coverage. That en-es has a coverage of 94% does not surprise me, that it has an error rate of 6% does. This would mean that you only need to change (postedit) 6 words in 100 in order to get an adequate translation. I suspect it is much higher :) Have you done the evaluation of your 4 texts for WER yet ? Fran PS. I fixed the problem with 'nueve': $ echo "son las nueve y todavĂa me da palo salir de la cama" | apertium es-en They are the nine and still gives me stick go out of the bed ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
