hmmmm, I didn't know about it... I only knew about some GPLed one... which 
didn't perform well for me.

I will test the existing one, but I think it is much better than mine. 
So, I think I've reinvented the wheel, and it seems it is not even rounder :D

cheers, 

--- On Sun, 5/3/09, Robert Muir <rcm...@gmail.com> wrote:

From: Robert Muir <rcm...@gmail.com>
Subject: Re: ArabicAnalyzer
To: java-dev@lucene.apache.org
Cc: dmsmith...@gmail.com
Date: Sunday, May 3, 2009, 9:56 AM

have you looked at the existing ar analyzer in contrib? 
I like your analyzer but glancing at your code I think you can get the same 
behavior with the existing one (it also has stopwords & stemming but you can 
disable that). lemme know if i am missing something!


wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:      0.374
arabicanalyzer: Average Precision:      0.316 <-- inappropriate 
stemming/stopwords

persianalyzer:   Average Precision:      0.481 <-- i can contrib this if 
someone needs it.

thanks,
robert

On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <ahmad_aloba...@yahoo.com> 
wrote:


Well I don't know really... but it shouldn't be hard to support it.

--- On Sun, 5/3/09, DM Smith <dmsmith...@gmail.com> wrote:


From: DM Smith <dmsmith...@gmail.com>
Subject: Re: ArabicAnalyzer

To: java-dev@lucene.apache.org
Date: Sunday, May 3, 2009, 4:05 AM


On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:


I've
 wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and 
ArabicFilter. It can handle Arabic text very well. 

I've tested it with large set of Arabic documents and it worked OK both in term 
of accuracy and performance.


The code is released under Apache 2.0 license. And I would be very happy if you 
include it with the code tree.
Sounds super. Do you know if it will handle Farsi as well?

-- DM Smith



      


-- 
Robert Muir
rcm...@gmail.com




      

Reply via email to