Verity, Lucene and some others have stemming which somewhat resolves
singulars and plurals

If you are looking for homonyms, synonyms you need a thesaurus.

I've used Soundex, quite extensively -- many state & local govts use
this for things like name searches:

Schmidt, Smith, Smythe give the same results.

But Pfiffer and Fiffer resolve to different soundex codes.

AFAIR, soundex is built into several dbs including SQL-Server.

But, it has been my experience, that unless you are a professional
searcher, like a librarian, these "advanced" search methods will be
used rarely.

There is one type of searching that offers a lot of possibility to
rapid get the user the results he seeks -- a concordance with the
output as a KWIC index.

You do a multiple word (or multiple partial word) search and each
result is shown on a line with context and search (key) words
highlighted. A hit will generate multiple lines. The lines are sorted
and presented so each line pivots around one key word, with the others
highlighted.

It is verbose & butt-ugly, but you can rapidly run your finger (cursor)
down the pivot word, see it in context with the other words, and find
the hit you want,

Amazon, qoogle, and some other sites claim to offer KWIC, but they
don't really -- they just offer a multi-word search and highlight the
words within a body of text.  It is very difficult to compare the
validity of one hit with another,  They find the hits quickly, but the
user must dig thru non-contiguous paragraphs to evaluate them. If they
presented the hits in a KWIC index it would be easy for the user.

Anyway, google for concordance and KWIC (KWAC and KWOC are just
variants).

HTH

Dick

"The reason there are two senators for each state is so that one can be
the designated driver."
-Jay Leno -

On Jul 25, 2004, at 9:31 AM, Roberto Perez wrote:

> At 10:57 AM 7/25/04, I myself wrote:
>
>  >I'm building an online dictionary with CFMX and
>  >Access 2000. I'd like to do what Webster's (http://www.m-w.com)
> does, which
>  >is: if you search for "kaflooey", the results page tells you there's
> no
>  >such entry in the dictionary, and they gives you a list of 10
> possible
>  >alternatives, similar in spelling (e.g., "kerflooey").
>
>  I've been doing some additional searching, and a message on this
> forum a
>  couple of weeks ago (Paul Vernon,
>  http://www.houseoffusion.com/lists.cfm/link=m:4:33876:170342)
> tangentially
>  talks about a dictionary search on several commercial databases (e.g.,
>  Webster, CIA factbook, etc.). They seem to be using a number of
> alternative
>  algorythms to match strings. I found several websites mention these
> strategies:
>
>  �����exact������Match words exactly
>  �����prefix�����Match prefixes
>  �����suffix�����Match suffixes
>  �����substring��Match substring occurring anywhere in word
>  �����re���������POSIX 1003.2 regular expressions (-)
>  �����re1 Old (basic) regular expressions
>  �����fnmatch��� fnmatch-like (* ? as wildcards) (-)
>  �����soundex��� Match using SOUNDEX algorithm (--)
>  �����lev Match words within Levenshtein Distance One
>  �����metaphone��metaphone algorithm (--)
>
>  Has anyone used any of these systems (or any other way of matching
> words to
>  a dictionary database)? Any ideas/suggestions on how to implement any
> of
>  these (Levenshtein gave me the best results when I tried several
> searches
>  at��http://www.web-architect.co.uk/cfx_dict.cfm)
>
>  Thanks in advance,
>
>  Roberto Perez
>  [EMAIL PROTECTED]
>
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Reply via email to