Thanks for the information. I did some quick tests of the ngramj program and it seems to work.
Randy -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 18, 2003 2:30 AM To: Lucene Users List Subject: RE: Any tools to detect language of document Have also a look at ngramj at sourceforge.net. I use this Java library. First I check the language-Meta tag of html page. If it is not avaiable I use ngramj to "guess" it. Probably this library could also be added to the Lucene Contributions List Stephan > look for Ted Dunning algorithm on the web. > > > -neil > > -----Original Message----- > From: Randy Darling [mailto:[EMAIL PROTECTED] > Sent: 17 juin, 2003 16:41 > To: Lucene Users List > Subject: Any tools to detect language of document > > > > I am attempting to come up with an automated way to > select which language analyzer to use on a document. > > Anyone know of any algorithms available to detect > what language the document may be written in? > > Are there any special Analyzers that attempt to support > multiple languages? > > > Thanks, > Randy > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ Bitte l�cheln! Fotogalerie online mit GMX ohne eigene Homepage! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
