You can also store a phonetic key for the term to find sounds-like matches.
I use double metaphone algorithm which appears to be English specific. Not
sure if there is something out there for Dutch.

For the length, I use relative distance cutoff (distance/length) in addition
to the absolute length cutoff that doesn't work very well for short words
(as you mentioned).

Alexey

-----Original Message-----
From: Aad Nales [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 18, 2004 11:59 AM
To: [EMAIL PROTECTED]
Subject: n-gram indexing for generating spell suggestions

...

2. often used misspelings in Dutch words between 4 and 5 characters were
missed. E.g. 'fiets' was suggested as a possible spell suggestion for
'feits' since no matching 3gram exist between the two. The same held
true for misspellings based on 'ch' and 'g' both being the same sound in
Dutch but written differently.

3. words that could never be part of a suggestion were added based on a
single matchting n-gram. (e.g. if I ask for suggestions on 'per' then
tupperware is also suggested. But solely based on length it is clear
that it has a minimal distance of 7. 

...

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to