Thanks! I've been looking for the exact same kind of tool a few days
ago and in the end I settled on Google's CLD (Compact Language
Detector) which was extracted out of Chrome's sources:
http://code.google.com/p/chromium-compact-language-detector/

It worked very well. If I will have the time I'll do a comparision
between that and Langmatch.

Thanks and best regards,
Tamas

On Wed, Jan 25, 2012 at 1:47 PM, Tom Hoar
<[email protected]> wrote:
>  This message announces the publication of possibly best open source
>  language identification tool available... ever.
>
>  https://launchpad.net/langmatch
>
>  Langmatch is a Python command-line tool that guesses the language of a
>  text string of any length (of course longer is better). Langmatch uses
>  language maps (fingerprints or models) from a variety of popular open
>  source tools (such as mguesser and libtextcat) or users can use
>  langmatch to create their own maps of any n-gram length. It can use the
>  451 3-gram models from the Python NLTK. Note that the NLTK language maps
>  have not yet been uploaded to the launchpad.net repository, although the
>  plan is to do so. You can obtain them from the NLTK corpus
>  (nltk_data/corpora/langid), or I'm happy to distribute to you directly
>  under under GNU GPL3. The Python code has been optimized for
>  performance. Maps of up to 7 grams run amazingly fast.
>
>  Langmatch seems infinitely configurable. You can run langmatch from the
>  command line or import it directly into your own program. End-of-line,
>  whole documents processing, you name it. It can report its raw scores,
>  or just the voted result. You can pick which maps to use for analysis
>  without moving the maps in/out of the installation folder. Also, the
>  author informally brands his file-opening code
>  "lib-open-my-god-damn-file.py". You get:
>   * stdin/stdout
>   * regular paths and filenames on Windows/Posix
>   * URLs: read files directly via http, ftp, ftps, etc
>   * custom URI types
>   * transparent decompression on local files
>   * iteration of directories for input
>   * substitution of default filenames for output
>   * much, much more
>
>  Langmatch has been tested on Linux Python 2.6, 2.7, 3 and MS Windows
>  Python 2.7 (should work on 2.6 & 3 for Windows and Mac). It is fully
>  Unicode-aware. It is distributed under the GNU GPL v3 license.
>
>  I hope some of you on this list will enjoy this tool.
>
>  Tom
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to