The classic textcat based on Cavnar & Trenkle, "N-Gram-Based Text 
Categorization": http://odur.let.rug.nl/~vannoord/TextCat/ (site also have nice 
overview of all available tools/libraries) & 
http://software.wise-guys.nl/libtextcat/ . I saw that there is also a 
pylibtextcat. Have not tried either.

Achim 

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Jacob Sparre Andersen
Sent: Saturday, July 09, 2011 2:53 PM
To: Moses support
Subject: Re: [Moses-support] Language guessing/ID tool

On Sun, 10 Jul 2011, Tom Hoar wrote:

> Does anyone know a good language guessing/ID tool in C++ or Python 
> with the ability to create new models? I've tried mguesser but if I 
> pipe in a file, it will ID the entire file, not each individual line. 
> Treating each line as a file is slow because it takes several seconds 
> to reload the program/model each time.

I think Kevin Scannell's Crúbadán might do the job.  IIRC it can identify 
languages with a high reliability on an even smaller scale than a whole 
line/sentence.

Kind regards,

Jacob Sparre Andersen
--
Jacob Sparre Andersen Research & Innovation Vesterbrogade 148K, 1. th.
1620 København V
Danmark

Phone:    +45 21 49 08 04
E-mail:   [email protected]
Web-site: http://www.jacob-sparre.dk/


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to