Dear Kevin, I'm pleased to hear you are trying to extend aspell support as widely as possible. I'm hoping I can contribute in a substantial way here. I have some web crawling software available that targets particular languages:
http://borel.slu.edu/crubadan/ It "bootstraps" a model of the target language based on previously seen texts and rarely makes mistakes if provided with sufficient "seed" texts. As you can see on the status page I've built up text corpora for quite a few languages. Part of the crawler is a module that ranks words in terms of the likelihood that they are actually correctly spelled words in the target language. The highest frequency words make it of course -- also n-gram statistics are calculated which are a good way of disqualifying the foreign (mostly English) words that sneak in. In the cases where I can find a dictionary I can check any suspect words manually. This is also, I should say, an excellent way of improving existing word lists. I've been in contact with the Breton and Welsh maintainers already. The upshot is that I should be able to package up reasonably clean wordlists for Manx Gaelic (gv), Scottish Gaelic (gd), Cebuano (ceb-- though I think "proc" chokes on 3-letter ISO-639 codes), and Setswana (tn). I've been contacted about starting Bambara (bm) as well. The Walloon ispell dictionary has a Makefile target that builds and installs an aspell dictionary, so that should be easy enough. Perhaps in future if you have speakers of small languages contacting you about creating spellcheckers from scratch you can direct them to me. I should mention that it works out of the box for ISO-8859 character sets but takes some effort for utf8... -Kevin _______________________________________________ Aspell-devel mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/aspell-devel