Per Tunedal <[email protected]> writes: > Hi, > wouldn't it be great if the input language was detected automatically? > Maybe TextCat http://www.let.rug.nl/vannoord/TextCat/ would do the > trick?
It already is, using CLD2 on the server; but there are many languages not supported by CLD2, and it might also be trained on different types of text. I suppose we could try training my Python port https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py and see if it turns out to work better; we might even make it learn online if people choose a different source language that overrides the textcat suggestion :-) Or at least manually add mistakes from the top of the various frequency lists to the word-override lists. -- Kevin Brubeck Unhammer GPG: 0x766AC60C
signature.asc
Description: PGP signature
------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
