Hi all, I have written a Lucene Analyzer for arabic. You will find it here : http://perso.wanadoo.fr/pierrick.brihaye/ArabicAnalyzer.jar (provisional adress, anybody interested in hosting it ?)
This work is still in beta stage but it gives quite good results :-) In order to make it work, you need : 1) a 1.4+ JVM (because of the native support for regular expressions which are heavily used in the program ; I've been too lazy to use an external package) 2) Apache Jakarta Commons-Collections : http://jakarta.apache.org/commons/collections.html 3) a recent Lucene distribution ;-) All this work is based on the amazing Tim Buckwalter's Arabic Morphological Analyzer Version 1.0 (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49) originaly written in Perl and released under the GPL. The jar contains : a) the compiled classes b) the required data files (dictionaries and compatibility tables) c) 2 command-line test programs d) 3 test documents with different encodings e) the source code f) a README file that will give you a little bit more of information :-) To Lucene developers : I plan to offer this work to Lucene (see the jar hierarchy... and the source file headers ;-). Any objections ? Feedback is very welcome : there are quite a lot of unresolved issues, with the analyzer itselfs as well as with Lucene. mE AlslAmap, cheers, p.b. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
