El dt 18 de 03 de 2014 a les 01:08 -0700, en/na Alex Aruj va escriure:
> Hello, I was still unable to see the updates to dictionaries taking
> full effect even after trying the -d . es-en solution, but I will try
> running lt-comp again, checking the lr and rl directionality and
> automorf and autogen bin files.
> 
> 
> I have shared part of the GSOC proposal that I think is most directly
> relevant to the task. I would like some feedback on it if anyone has
> time. If any ideas about the project are misguided, please suggest
> alternatives. The formatting options are a little wacky on Windows 8
> MSWord--will certainly adjust later.

Comments:

I think it might be more convincing if you showed the existing coverage
on a range of corpora, and showed estimates of how many words you would
have to add in order to reach the targets you've given yourself. I would
like to see a week-by-week plan.

Procedure:

1) Calculate coverage over the whole corpus. 
2) Get number of known tokens/total tokens.
3) Find out how many more tokens you need to add in order to increase 1%
4) Make a frequency list of unknown words
5) Starting at the top of the list, count down number of words and token
count. This way you should be able to find how many tokens (surface
forms) you need to over to increase by 1%.

You seem to be confusing error rate with coverage. That en-es has a
coverage of 94% does not surprise me, that it has an error rate of 6%
does. This would mean that you only need to change (postedit) 6 words in
100 in order to get an adequate translation. I suspect it is much
higher :)

Have you done the evaluation of your 4 texts for WER yet ?

Fran

PS. I fixed the problem with 'nueve':

$ echo "son las nueve y todavĂ­a me da palo salir de la cama" | apertium
es-en
They are the nine and still gives me stick go out of the bed


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to