Hi Elie, > You could try Doug Beeferman's variable-length character n-gram approach > to identify a language among 13 european ones. > http://www.dougb.com/ident.html
> If you just have 4 or 5 languages to deal with, you can build your > own with the most frequent word lists for each language. I have some > trivial C++ code that does it and can send it to you it you need. > Identified language is choosen on a frequency criterion. > I have at the moment only two languages (en, de) but this could increase. But I think not more than yours 4 to 5. It would be great if you could send me your example code. Probably I try to port it to Java. Thanks in advance, Stephan Strittmatter -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
