Kevin Atkinson wrote: > The biggest change in Aspell 0.51 is support for Affix Compression. > Affix compression is the act of combining several words with a common base > word into one word which consists of the base word and a list of affixes > to apply. (Affix is the generic term for prefix, suffix or infix). For > example "alarm alarms alarmed alarming" will become "alarm/SDG" where SDG > stands for the suffixes of alarm. This can make a huge difference in > space for languages with have extensive affixation such as German.
While I greet this improvement, I object to the term "affix compression". Making the dictionary file smaller (compression) might be one effect of using affix flags, but more important to languages such as German and Swedish is a guarantee that every grammatically legal ending for each word is covered by the dictionary. "Grammatical completeness" is the desired effect. It would be sad if this couldn't be combined with the highly appreciated sounds-alike function of Aspell. Ultimately, every time a new word is added to the dictionary, the correct affix flags should also be added. There is little point in adding "alarming" to the dictionary unless "alarm" and "alarms" are added at the same time. The OCR software FineReader version 6 and later, at least in its English version, contains an example of how a user interface for adding words to a dictionary with affix patterns can be designed. This is try-and-buy software (for Microsoft Windows), so you can have a free look at it at http://www.finereader.com/ Roughly speaking, when the user wants to add a word to the dictionary, she is asked for the word's basic form (alarming -> alarm) and then all possible endings resulting from the available affix flags are listed with check boxes. The user can check the flexions that apply and submit the new word. An example: You want to add "going" to the dictionary. The system asks what the basic form is. You enter "go". The system asks which endings are legal: goes, goed, going. You mark goes and going, and submit. The system stores go/SG. (Assuming that /S adds -es to words that end in a wovel.) Will the affix definition file follow the ispell or myspell format, or use its own format? I personally maintain a Swedish dictionary in ispell format from which I generate my Aspell dictionary, using "ispell -e" for expansion. Currently I have no good way to add new words interactively, when using Aspell. I usually open my source dictionary file in Emacs, edit it, then run "make" to rebuild my dictionaries, all batch oriented. Ispell comes with the "munchlist" utility that can be helpful in developing good dictionaries. If munchlist fails to apply an affix flag, it is because the expanded dictionary (current aspell format) didn't contain one form of the word. My Swedish expanded Aspell dictionary has 5.34 times more words (264K words) than my source file in ispell format (49K). Munchlist is able to compress this to marginally smaller (48K), because my source is grammatically correct and not mathematically optimized for list compression. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se/ _______________________________________________ Aspell-devel mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/aspell-devel