The classic textcat based on Cavnar & Trenkle, "N-Gram-Based Text Categorization": http://odur.let.rug.nl/~vannoord/TextCat/ (site also have nice overview of all available tools/libraries) & http://software.wise-guys.nl/libtextcat/ . I saw that there is also a pylibtextcat. Have not tried either.
Achim -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jacob Sparre Andersen Sent: Saturday, July 09, 2011 2:53 PM To: Moses support Subject: Re: [Moses-support] Language guessing/ID tool On Sun, 10 Jul 2011, Tom Hoar wrote: > Does anyone know a good language guessing/ID tool in C++ or Python > with the ability to create new models? I've tried mguesser but if I > pipe in a file, it will ID the entire file, not each individual line. > Treating each line as a file is slow because it takes several seconds > to reload the program/model each time. I think Kevin Scannell's Crúbadán might do the job. IIRC it can identify languages with a high reliability on an even smaller scale than a whole line/sentence. Kind regards, Jacob Sparre Andersen -- Jacob Sparre Andersen Research & Innovation Vesterbrogade 148K, 1. th. 1620 København V Danmark Phone: +45 21 49 08 04 E-mail: [email protected] Web-site: http://www.jacob-sparre.dk/ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
