Have also a look at ngramj at sourceforge.net. I use this Java library. First I check the language-Meta tag of html page. If it is not avaiable I use ngramj to "guess" it.
Probably this library could also be added to the Lucene Contributions List Stephan > look for Ted Dunning algorithm on the web. > > > -neil > > -----Original Message----- > From: Randy Darling [mailto:[EMAIL PROTECTED] > Sent: 17 juin, 2003 16:41 > To: Lucene Users List > Subject: Any tools to detect language of document > > > > I am attempting to come up with an automated way to > select which language analyzer to use on a document. > > Anyone know of any algorithms available to detect > what language the document may be written in? > > Are there any special Analyzers that attempt to support > multiple languages? > > > Thanks, > Randy > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ Bitte l�cheln! Fotogalerie online mit GMX ohne eigene Homepage! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
