A 2015-03-04 10:48, Tino Didriksen escrigué: > The APY backend of http://apertium.org/ [1] has been gathering > frequency data on words that fail translation, but so far nobody has > really known about it or been able to inspect this data. > > Now you can: http://apertium.projectjj.com/missingFreqs.php [2]
Nice work! > It lists pairs and how many entries a pair has, then for each pair it > will list the 1000 most frequent words, > e.g. http://apertium.projectjj.com/missingFreqs.php?pair=swe-dan [3] > > There's a lot of source/target language confusion and people using the > entirely wrong language pair, which means we have a problem and need > to fix the apertium.org [4] interface so people don't make that > mistake. Or make it detect languages better and override people's > choice when they're clearly wrong. Yeah, that definitely needs doing. > The source of the script is > at http://apertium.projectjj.com/missingFreqs.txt [5] For the kaz-tat,tat-kaz directions you could grep out Latin characters, would remove at least some of the bokmål and nynorsk :D F. ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
