DM, there is no upper/lower cases in Arabic, so don't worry, but the stop word list needs some corrections and may miss some common/stop Arabic words.
Best, On Thu, Oct 8, 2009 at 4:14 PM, DM Smith <dmsmith...@gmail.com> wrote: > Robert, > Thanks for the info. > As I said, I am illiterate in Arabic. So I have another, perhaps > nonsensical, question: > Does the stop word list have every combination of upper/lower case for each > Arabic word in the list? (i.e. is it fully de-normalized?) Or should it come > after LowerCaseFilter? > -- DM > On Oct 8, 2009, at 8:37 AM, Robert Muir wrote: > > DM, this isn't a bug. > > The arabic stopwords are not normalized. > > but for persian, i normalized the stopwords. mostly because i did not want > to have to create variations with farsi yah versus arabic yah for each one. > > On Thu, Oct 8, 2009 at 7:24 AM, DM Smith <dmsmith...@gmail.com> wrote: >> >> I'm wondering if there is a bug in ArabicAnalyzer in 2.9. (I don't know >> Arabic or Farsi, but have some texts to index in those languages.) >> The tokenizer/filter chain for ArabicAnalyzer is: >> TokenStream result = new ArabicLetterTokenizer( reader ); >> result = new StopFilter( result, stoptable ); >> result = new LowerCaseFilter(result); >> result = new ArabicNormalizationFilter( result ); >> result = new ArabicStemFilter( result ); >> >> return result; >> >> Shouldn't the StopFilter come after ArabicNormalizationFilter? >> >> As a comparison the PersianAnalyzer has: >> TokenStream result = new ArabicLetterTokenizer(reader); >> result = new LowerCaseFilter(result); >> result = new ArabicNormalizationFilter(result); >> /* additional persian-specific normalization */ >> result = new PersianNormalizationFilter(result); >> /* >> * the order here is important: the stopword list is normalized with >> the >> * above! >> */ >> result = new StopFilter(result, stoptable); >> >> return result; >> >> >> Thanks, >> DM > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org