Láv, skábmamánu 25. b. 2006 00.12, Kevin Atkinson čálii: > On Fri, 24 Nov 2006, Børre Gaup wrote: > > I work in a project which is going to make spellcheckers for Northern and > > Lule Sami, among others a high-quality Aspell spell checker. > > > > We use Xerox two-level morphological tools to make fullform word lists. > > The Northern Sami fullform word list is now about 24GB. The word list can > > be broken down into word forms covering a single stem + inflection and > > other endings. Each word can have up to 16000 unique endings, and the set > > of inflectional endings a word can have varies. We thus have several such > > sets of inflectional endings. The exact number needed for Aspell is not > > yet known, but the present Xerox-based lexicons have more than 150 such > > sets. > > It sounds like hunspell might be a better choice since it supports twofold > affix stripping. I would very mush like to incorporate many of hunspell > features into Aspell but I simply don't have the time. I would greatly > appreciate any help in this area. > The problem is that hunspell is not as ubiquitous as aspell. As far as I have seen hunspell is not commonly used, but aspell is used both in Linux and in Mac OS X (through Cocoaspell). Hunspell is _intended_ to replace myspell in openoffice.org (according to it's homepage).
What features in hunspell would you specifically like to have in aspell? > > We made an affix file containing the 16000 unique endings one of our > > words had, and that file alone became 1.5 MB. Our calculations tell us > > that if we continue in this vein for all our words, we will end up with > > an affix file that can be as big as 50MB. > > > > As far as we understand there are 52 available affix classes for the > > affix file. It is probable that we would need more affix classes than the > > existing 52. Is it possible to increase this number? > > More like around 200 since you can use any 8-bit symbol. > Ok, then that misunderstanding is cleared away. > > If that is not possible, we will probably end up with a very big > > wordlist, amounting up to some gigabyte. How well will aspell tackle a > > wordlist of that size? > > Well Aspell should do just fine if it will all fit in memory. All bets > all of it if doesn't. :) -- Børre Gaup Prošeaktamielbargi - Project worker tel(W): +47 77 64 59 64 tel(GSM): +47 41 08 03 64 e-mail:[EMAIL PROTECTED] http://divvun.no/english.html _______________________________________________ Aspell-devel mailing list Aspell-devel@gnu.org http://lists.gnu.org/mailman/listinfo/aspell-devel