have you looked at the existing ar analyzer in contrib? I like your analyzer but glancing at your code I think you can get the same behavior with the existing one (it also has stopwords & stemming but you can disable that). lemme know if i am missing something!
wrt farsi i wouldnt recommend using an arabic analyzer for example on hamshari trec data: simpleanalyzer: Average Precision: 0.374 arabicanalyzer: Average Precision: 0.316 <-- inappropriate stemming/stopwords persianalyzer: Average Precision: 0.481 <-- i can contrib this if someone needs it. thanks, robert On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <ahmad_aloba...@yahoo.com>wrote: > Well I don't know really... but it shouldn't be hard to support it. > > --- On *Sun, 5/3/09, DM Smith <dmsmith...@gmail.com>* wrote: > > > From: DM Smith <dmsmith...@gmail.com> > Subject: Re: ArabicAnalyzer > To: java-dev@lucene.apache.org > Date: Sunday, May 3, 2009, 4:05 AM > > > > On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote: > > I've wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and > ArabicFilter. It can handle Arabic text very well. > > I've tested it with large set of Arabic documents and it worked OK both in > term of accuracy and performance. > > The code is released under Apache 2.0 license. And I would be very happy if > you include it with the code tree. > > > Sounds super. Do you know if it will handle Farsi as well? > > -- DM Smith > > > -- Robert Muir rcm...@gmail.com