RE: Language identification ??

Itamar Syn-Hershko Fri, 14 Mar 2008 07:31:22 -0700

For what it worths, I did something similar in my BidiAnalyzer so I can
index both Hebrew/Semitic texts and English/Latin words without switching
analyzers, giving each the proper treatment. I did it simply by testing the
first char and looking at its numeric value - so it falls between Hebrew
Aleph and Taph then its Hebrew, else its Latin. I wonder how you would spot
a French word in an English text for instance (aren't there parallel words?)


Itamar.

-----Original Message-----
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 14, 2008 3:34 PM
To: java-user@lucene.apache.org
Subject: Re: Language identification ??

I think Karl Wettin has one that is a patch in JIRA.  Try searching there.

On Mar 14, 2008, at 1:28 AM, Raghu Ram wrote:

> Hi all,
>  I guess this question is  a bit off the track. Are there any language 
> identification modules inside Lucene ??? If not can somebody please 
> suggest me a good one.
>
> Thank You.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Language identification ??

Reply via email to