Re: ArabicAnalyzer

Robert Muir Sat, 02 May 2009 23:56:44 -0700

have you looked at the existing ar analyzer in contrib?
I like your analyzer but glancing at your code I think you can get the same
behavior with the existing one (it also has stopwords & stemming but you can
disable that). lemme know if i am missing something!


wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:      0.374
arabicanalyzer: Average Precision:      0.316 <-- inappropriate
stemming/stopwords
persianalyzer:   Average Precision:      0.481 <-- i can contrib this if
someone needs it.

thanks,
robert

On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <[email protected]>wrote:

> Well I don't know really... but it shouldn't be hard to support it.
>
> --- On *Sun, 5/3/09, DM Smith <[email protected]>* wrote:
>
>
> From: DM Smith <[email protected]>
> Subject: Re: ArabicAnalyzer
> To: [email protected]
> Date: Sunday, May 3, 2009, 4:05 AM
>
>
>
> On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:
>
> I've wrote a simple (but yet useful) ArabicAnalyzer, ArabicTokenizer and
> ArabicFilter. It can handle Arabic text very well.
>
> I've tested it with large set of Arabic documents and it worked OK both in
> term of accuracy and performance.
>
> The code is released under Apache 2.0 license. And I would be very happy if
> you include it with the code tree.
>
>
> Sounds super. Do you know if it will handle Farsi as well?
>
> -- DM Smith
>
>
>


-- 
Robert Muir
[email protected]

Re: ArabicAnalyzer

Reply via email to