I know that several people are working on their own on a spell check engine
for MetaCard.  I too have been doing this, and here's an idea worth sharing
- my own version of "soundex".

The phone company uses something called a soundex to match names that sound
alike, and basically what it does is assign a number value to each consonant
after the first char of a word and remove vowels.  I found this to be a neat
idea, but not specific enough, so I wrote this script to create my own
version of the "soundex".  What it does is remove all vowels from a word to
create a "soundex" type code for spell check matching.  It works on the
assumption that when most people misspell a word, it's the vowels they mess
up usually.  So (believe, beleeve, and beleive) all have the same "soundex",
which would be "blv".

So here are the steps:
1.  Get a dictionary file.
2.  Create a soundex code for each word in your dictionary file.
3.  To check the spelling of a word, check it against the dictionary file.
4.  If it isn't in the dictionary file, get the soundex for it.
5.  Find words in your dictionary with the SAME soundex.
6.  Create a list of words with the same soundex and offer them as choices.
If there are more than 15, limit it to the first 15 choices.

This method is not perfect, but it's a heck of a lot closer than anything
else I've tried.  The function I wrote is listed below.

function soundex thisWord
  put char 1 of thisWord into thisWord2
  repeat with j = 2 to (the number of chars of thisWord)
    if char j of thisWord = char j+1 of thisWord then next repeat
    if (char j of thisWord is not "a") and \
        (char j of thisWord is not "e") and \
        (char j of thisWord is not "i") and \
        (char j of thisWord is not "o") and \
        (char j of thisWord is not "u") and \
        (char j of thisWord is not "y") and \
        (char j of thisWord is not "h") then
      put char j of thisWord after thisWord2
    end if
  end repeat
  return thisWord2
end soundex

I chose to also exclude the letter h, because it's very often silent when
it's not the first character in a word.  Also, I put in line 4 because
people often misspell because they didn't double a consonant when they
should have, or they doubled a consonant when they shouldn't have.

For my uses, I created two separate files - a dictionary file and a soundex
file.  Each word was the same line number in each file.  I suppose you could
also use one file with each line having two items... the word and it's
soundex.  But then you can't use the offset command to find matches.

Anyway, this took me forever, so I thought I'd share.



This is the MetaCard mailing list.
Archives: http://www.mail-archive.com/metacard%40lists.best.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm

Reply via email to