Uwe,
!00% correct

On Thu, Oct 8, 2009 at 4:56 PM, Uwe Schindler <u...@thetaphi.de> wrote:
> I think the idea of lowercase filter in the arabic analyzers is not to
> really index mixed language texts. It is more for the case, if you have some
> word between the Arabic content (like product names,.), which happens often.
> You see this often also in Japanese texts. And for these embedded English
> fragments you really need no stop word list. And if there is a stop word in
> it, for the target language it is not a real stop word, it may be additional
> information. Stop word removal is done mostly because of they are needless
> (appear in every text). But if you have one Arabic sentence where "the" also
> appears next to an English word, it is more important than all the "the" in
> this mail.
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to