Have also a look at ngramj at sourceforge.net. I use this Java library.

First I check the language-Meta tag of html page. If it is not avaiable I
use ngramj 
to "guess" it.

Probably this library could also be added to the Lucene Contributions List

Stephan

> look for Ted Dunning algorithm on the web.
> 
> 
> -neil
> 
> -----Original Message-----
> From: Randy Darling [mailto:[EMAIL PROTECTED]
> Sent: 17 juin, 2003 16:41
> To: Lucene Users List
> Subject: Any tools to detect language of document
> 
> 
> 
> I am attempting to come up with an automated way to
> select which language analyzer to use on a document.
> 
> Anyone know of any algorithms available to detect
> what language the document may be written in?
> 
> Are there any special Analyzers that attempt to support
> multiple languages?
> 
> 
> Thanks,
> Randy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
Bitte l�cheln! Fotogalerie online mit GMX ohne eigene Homepage!


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to