Uwe, !00% correct On Thu, Oct 8, 2009 at 4:56 PM, Uwe Schindler <u...@thetaphi.de> wrote: > I think the idea of lowercase filter in the arabic analyzers is not to > really index mixed language texts. It is more for the case, if you have some > word between the Arabic content (like product names,.), which happens often. > You see this often also in Japanese texts. And for these embedded English > fragments you really need no stop word list. And if there is a stop word in > it, for the target language it is not a real stop word, it may be additional > information. Stop word removal is done mostly because of they are needless > (appear in every text). But if you have one Arabic sentence where "the" also > appears next to an English word, it is more important than all the "the" in > this mail. > > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >
--------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org