Re: ArabicAnalyzer

DM Smith Sun, 03 May 2009 04:37:40 -0700


On May 3, 2009, at 2:56 AM, Robert Muir wrote:

have you looked at the existing ar analyzer in contrib?
I like your analyzer but glancing at your code I think you can getthe same behavior with the existing one (it also has stopwords &stemming but you can disable that). lemme know if i am missingsomething!
wrt farsi i wouldnt recommend using an arabic analyzer
for example on hamshari trec data:

simpleanalyzer: Average Precision:      0.374
arabicanalyzer: Average Precision: 0.316 <-- inappropriatestemming/stopwordspersianalyzer: Average Precision: 0.481 <-- i can contribthis if someone needs it.

Please do contribute it. While I don't know Persian at all, theprogram I am working on is translated into Farsi and we have severalindexed texts.

thanks,
robert
On Sun, May 3, 2009 at 2:09 AM, Ahmed Al-Obaidy <[email protected]> wrote:
Well I don't know really... but it shouldn't be hard to support it.

--- On Sun, 5/3/09, DM Smith <[email protected]> wrote:

From: DM Smith <[email protected]>
Subject: Re: ArabicAnalyzer
To: [email protected]
Date: Sunday, May 3, 2009, 4:05 AM



On May 2, 2009, at 6:43 PM, Ahmed Al-Obaidy wrote:
I've wrote a simple (but yet useful) ArabicAnalyzer,ArabicTokenizer and ArabicFilter. It can handle Arabic text verywell.
I've tested it with large set of Arabic documents and it worked OKboth in term of accuracy and performance.
The code is released under Apache 2.0 license. And I would be veryhappy if you include it with the code tree.
Sounds super. Do you know if it will handle Farsi as well?

-- DM Smith





--
Robert Muir
[email protected]

Re: ArabicAnalyzer

Reply via email to