I'm wondering if anyone knows of a decent mis-spellings database anywhere? That is, a mapping from mis-spelling to correct spelling (or vice-versa)? I'm currently using a 550-item set adapted from:
http://www.actwin.com/rwmack/spelling.htm
and it's fine for testing, but I'm looking for something that might have a few tens of thousands of entries. Basically, I want to build a "common error tracking" system into my spell-checker, and would like a corpus of (real-world (English)) data so that I can judge the effectiveness of the new feature when built.

BTW, for those following along at home, I've got the phonetic searching working quite nicely (.1 to .6s/search on average using Metakit), and it finds the correct words in the above test-set most of the time. There are a number of examples that resist phonetic comparison quite strongly, which accounts for a few of the errors. (Basically high-level rules mistakenly compress features so that similarly-spelled sub-strings wind up with entirely different phonetic encodings.) My latest optimisation of the distance > 1 searches (reversed-word-indexing) also means that a few entries where errors occur both in the beginning and the end of the word aren't caught. All-in-all, about 20 or so of the errors don't get their intended word as a suggestion.

Anyway, have fun all,
Mike

_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/





_______________________________________________
Aspell-devel mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/aspell-devel

Reply via email to