RE: Automatically determin Language of document

Strittmatter Stephan (external) Tue, 27 Nov 2001 23:40:27 -0800

Hi Elie,

> You could try Doug Beeferman's variable-length character n-gram approach
> to identify a language among 13 european ones.
> http://www.dougb.com/ident.html


> If you just have 4 or 5 languages to deal with, you can build your
> own with the most frequent word lists for each language. I have some
> trivial C++ code that does it and can send it to you it you need.
> Identified language is choosen on a frequency criterion.
> 

I have at the moment only two languages (en, de) but this could increase.
But I think not more than yours 4 to 5.
It would be great if you could send me your example code.
Probably I try to port it to Java.

Thanks in advance,

Stephan Strittmatter

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: Automatically determin Language of document

Reply via email to