Re: Arabic Analyzer: possible bug

Basem Narmok Thu, 08 Oct 2009 13:20:01 -0700

Uwe,
!00% correct

On Thu, Oct 8, 2009 at 4:56 PM, Uwe Schindler <[email protected]> wrote:
> I think the idea of lowercase filter in the arabic analyzers is not to
> really index mixed language texts. It is more for the case, if you have some
> word between the Arabic content (like product names,.), which happens often.
> You see this often also in Japanese texts. And for these embedded English
> fragments you really need no stop word list. And if there is a stop word in
> it, for the target language it is not a real stop word, it may be additional
> information. Stop word removal is done mostly because of they are needless
> (appear in every text). But if you have one Arabic sentence where "the" also
> appears next to an English word, it is more important than all the "the" in
> this mail.
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Arabic Analyzer: possible bug

Reply via email to