Hello! I work in a project which is going to make spellcheckers for Northern and Lule Sami, among others a high-quality Aspell spell checker.
We use Xerox two-level morphological tools to make fullform word lists. The Northern Sami fullform word list is now about 24GB. The word list can be broken down into word forms covering a single stem + inflection and other endings. Each word can have up to 16000 unique endings, and the set of inflectional endings a word can have varies. We thus have several such sets of inflectional endings. The exact number needed for Aspell is not yet known, but the present Xerox-based lexicons have more than 150 such sets. We made an affix file containing the 16000 unique endings one of our words had, and that file alone became 1.5 MB. Our calculations tell us that if we continue in this vein for all our words, we will end up with an affix file that can be as big as 50MB. As far as we understand there are 52 available affix classes for the affix file. It is probable that we would need more affix classes than the existing 52. Is it possible to increase this number? If that is not possible, we will probably end up with a very big wordlist, amounting up to some gigabyte. How well will aspell tackle a wordlist of that size? regards, -- Børre Gaup Prošeaktamielbargi - Project worker tel(W): +47 77 64 59 64 tel(GSM): +47 41 08 03 64 e-mail:[EMAIL PROTECTED] http://divvun.no/english.html _______________________________________________ Aspell-devel mailing list Aspell-devel@gnu.org http://lists.gnu.org/mailman/listinfo/aspell-devel