Per Tunedal <[email protected]>
writes:

> Hi,
> wouldn't it be great if the input language was detected automatically?
> Maybe TextCat http://www.let.rug.nl/vannoord/TextCat/ would do the
> trick?

It already is, using CLD2 on the server; but there are many languages
not supported by CLD2, and it might also be trained on different types
of text.

I suppose we could try training my Python port
https://github.com/unhammer/gt-CorpusTools/blob/master/corpustools/text_cat.py
and see if it turns out to work better; we might even make it learn
online if people choose a different source language that overrides the
textcat suggestion :-) Or at least manually add mistakes from the top of
the various frequency lists to the word-override lists.


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C

Attachment: signature.asc
Description: PGP signature

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to