Hi, The most time-consuming suggestion algorithms of Hunspell are the MAP and n-gram suggestions. MAP is a character permutation algorithm, and for a single misspelled French word with 8 vowels (using the MAP definition of your affix file) can check ~4^8 = 65 thousand possible suggestions.
MAP algorithm has a time limit (~1 sec.). This is an acceptable value under word processing. But time consuming of the n-gram based suggestion algorithm depends only on the dictionary size. More than 100 thousand dictionary words can result big n-gram suggestion time for a long word. Hunspell with the 90 thousand words of the recent French dictionary is not too slow for a single suggestion or spell checking a long document without suggestions. For other tasks (automatic spell checking of long texts with suggestions), remove or limit MAP definition, and use MAXNGRAMSUGS 0 in the affix file to disable n-gram suggestions. Other option is to use affixes to compress a large dictionary (~200-300 thousand words). There is a new tool in the Hunspell distribution for automatic affix compression, "affixcompress". A Mongolian word list with 2.7 million words has been compressed to 77 thousand words by affixcompress: http://www.openoffice.org/issues/show_bug.cgi?id=92263 (Note: affixcompress is not the best tool for an agglutinative language, like Mongolian, but I hope, future versions will be able to detect the morphology and classify the words of a huge corpus. Now the output of affixcompress will help to detect real stems and frequent suffixes from the words of a text corpus.) It is also useful to split a large dictionary to a base part (~100 thousand word) and extra dictionaries. Hunspell library and standalone Hunspell have already supported extra dictionaries (see Hunspell manual). I hope, OpenOffice.org will support also this feature in the near future. Best regards, László 2008/8/18 Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED] > > > Hi, > > I don't know the algorithm used in hunspell. > Thus how about other dictionaries? > Maybe the large number of entries in the affix file is exactly the > reason why it can be so fast with Hungarian words... > > Thomas > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
