For what it worths, I did something similar in my BidiAnalyzer so I can index both Hebrew/Semitic texts and English/Latin words without switching analyzers, giving each the proper treatment. I did it simply by testing the first char and looking at its numeric value - so it falls between Hebrew Aleph and Taph then its Hebrew, else its Latin. I wonder how you would spot a French word in an English text for instance (aren't there parallel words?)
Itamar. -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Friday, March 14, 2008 3:34 PM To: java-user@lucene.apache.org Subject: Re: Language identification ?? I think Karl Wettin has one that is a patch in JIRA. Try searching there. On Mar 14, 2008, at 1:28 AM, Raghu Ram wrote: > Hi all, > I guess this question is a bit off the track. Are there any language > identification modules inside Lucene ??? If not can somebody please > suggest me a good one. > > Thank You. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]