I'm wondering if anyone knows of a decent mis-spellings database
anywhere? That is, a mapping from mis-spelling to correct spelling (or
vice-versa)? I'm currently using a 550-item set adapted from:
http://www.actwin.com/rwmack/spelling.htm
and it's fine for testing, but I'm looking for something that might have
a few tens of thousands of entries. Basically, I want to build a
"common error tracking" system into my spell-checker, and would like a
corpus of (real-world (English)) data so that I can judge the
effectiveness of the new feature when built.
BTW, for those following along at home, I've got the phonetic searching
working quite nicely (.1 to .6s/search on average using Metakit), and it
finds the correct words in the above test-set most of the time. There
are a number of examples that resist phonetic comparison quite strongly,
which accounts for a few of the errors. (Basically high-level rules
mistakenly compress features so that similarly-spelled sub-strings wind
up with entirely different phonetic encodings.) My latest optimisation
of the distance > 1 searches (reversed-word-indexing) also means that a
few entries where errors occur both in the beginning and the end of the
word aren't caught. All-in-all, about 20 or so of the errors don't get
their intended word as a suggestion.
Anyway, have fun all,
Mike
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
_______________________________________________
Aspell-devel mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/aspell-devel
- Re: [aspell-devel] Mis-spellings database? Mike C. Fletcher
- Re: [aspell-devel] Mis-spellings database? Kevin Atkinson
- Re: [aspell-devel] Mis-spellings database? Mike C. Fletcher