Basem, by any chance would you be willing to help improve it for us? On Thu, Oct 8, 2009 at 9:20 AM, Basem Narmok <nar...@gmail.com> wrote:
> DM, there is no upper/lower cases in Arabic, so don't worry, but the > stop word list needs some corrections and may miss some common/stop > Arabic words. > > Best, > > On Thu, Oct 8, 2009 at 4:14 PM, DM Smith <dmsmith...@gmail.com> wrote: > > Robert, > > Thanks for the info. > > As I said, I am illiterate in Arabic. So I have another, perhaps > > nonsensical, question: > > Does the stop word list have every combination of upper/lower case for > each > > Arabic word in the list? (i.e. is it fully de-normalized?) Or should it > come > > after LowerCaseFilter? > > -- DM > > On Oct 8, 2009, at 8:37 AM, Robert Muir wrote: > > > > DM, this isn't a bug. > > > > The arabic stopwords are not normalized. > > > > but for persian, i normalized the stopwords. mostly because i did not > want > > to have to create variations with farsi yah versus arabic yah for each > one. > > > > On Thu, Oct 8, 2009 at 7:24 AM, DM Smith <dmsmith...@gmail.com> wrote: > >> > >> I'm wondering if there is a bug in ArabicAnalyzer in 2.9. (I don't know > >> Arabic or Farsi, but have some texts to index in those languages.) > >> The tokenizer/filter chain for ArabicAnalyzer is: > >> TokenStream result = new ArabicLetterTokenizer( reader ); > >> result = new StopFilter( result, stoptable ); > >> result = new LowerCaseFilter(result); > >> result = new ArabicNormalizationFilter( result ); > >> result = new ArabicStemFilter( result ); > >> > >> return result; > >> > >> Shouldn't the StopFilter come after ArabicNormalizationFilter? > >> > >> As a comparison the PersianAnalyzer has: > >> TokenStream result = new ArabicLetterTokenizer(reader); > >> result = new LowerCaseFilter(result); > >> result = new ArabicNormalizationFilter(result); > >> /* additional persian-specific normalization */ > >> result = new PersianNormalizationFilter(result); > >> /* > >> * the order here is important: the stopword list is normalized with > >> the > >> * above! > >> */ > >> result = new StopFilter(result, stoptable); > >> > >> return result; > >> > >> > >> Thanks, > >> DM > > > > > > -- > > Robert Muir > > rcm...@gmail.com > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com