On Sat, 17 Apr 2004, Mohammed Elzubeir wrote: > On Sat, 2004-04-17 at 13:21, Kevin Atkinson wrote: > > Hi, I am contacting you because you are the author of Duali. > > > > Would you be interested in working with be to add support for Arabic to > > Aspell (http://aspell.net)? > > > > Very interested. It was my original intention to do it this way, but I > have received very little feedback during my initial attempts to > establish contact and so I have abandoned that to work on Duali.
I searched my mailbox and found some brief discussion. Back then Aspell lacked Affix support or support for Unicode. This has now changed. > > I believe Aspell can handle it now, however, I am not sure. The Arabic > > encoding in Unicode is very complicated and I do not understand all the > > issues involved. Would you be willing to explain to be what I need to > > know about the encoding for spell checking. In particular should I expect > > the "Arabic Presentation Forms" to be used? Should words in the > > dictionary be encoded with the presentation forms? Etc. > > Presentation forms are just what they say they are, for visual > presentation. So, they would not be used. The dictionary is encoded in > UTF-8 format (at least that's what I use for Duali). Aspell supports Unicode, but internally it is still 8-bit. So the first order of business is to establish an internal encoding. Is iso-8859-6 sufficient? If not a new character set can be made up. You can use up to 210 characters (128 upper 8-bit, 30 control, 52 Latin letters). If you could tell me what parts of the Unicode block 0600-06FF Arabic needs for words I can create a mapping for you. > > Once we establish that Aspell can indeed handle Arabic the next thing to do > > is to convert the "prefixes, suffixes, etc" into an affix file. > > > > You may want to have a look at the lexicon class [1] in CVS. Those are > the main methods performed in Duali. OK. That looks a lot like Aspell affix code. I believe Aspell can now handle it. However the affix data needs to be converted into a single Affix file. See http://aspell.sourceforge.net/devel-doc/man/Affix-Compression.html. > > If you rather spend your effort working on Duali I fully understand. > > Our (Arabeyes') goal (and mine) are to make Arabic spell checking > available to as big of an audience as possible. I believe adding it to > aspell would make it more reachable. Having said that, I don't plan to > completely abandon Duali itself, but would be more than happy to > actively contribute to Arabic spell checking in aspell. OK great. > P.S. Please do contact me via the 'developer' [2] list as some people > may be interested to know about such issues. Will do. --- http://kevin.atkinson.dhs.org _______________________________________________ Developer mailing list [EMAIL PROTECTED] http://lists.arabeyes.org/mailman/listinfo/developer

