A 2015-03-04 10:48, Tino Didriksen escrigué:
> The APY backend of http://apertium.org/ [1] has been gathering
> frequency data on words that fail translation, but so far nobody has
> really known about it or been able to inspect this data.
> 
> Now you can: http://apertium.projectjj.com/missingFreqs.php [2]

Nice work!

> It lists pairs and how many entries a pair has, then for each pair it
> will list the 1000 most frequent words,
> e.g. http://apertium.projectjj.com/missingFreqs.php?pair=swe-dan [3]
> 
> There's a lot of source/target language confusion and people using the
> entirely wrong language pair, which means we have a problem and need
> to fix the apertium.org [4] interface so people don't make that
> mistake. Or make it detect languages better and override people's
> choice when they're clearly wrong.

Yeah, that definitely needs doing.

> The source of the script is
> at http://apertium.projectjj.com/missingFreqs.txt [5]

For the kaz-tat,tat-kaz directions you could grep out Latin characters, 
would remove
at least some of the bokmål and nynorsk :D

F.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to