You can also store a phonetic key for the term to find sounds-like matches. I use double metaphone algorithm which appears to be English specific. Not sure if there is something out there for Dutch.
For the length, I use relative distance cutoff (distance/length) in addition to the absolute length cutoff that doesn't work very well for short words (as you mentioned). Alexey -----Original Message----- From: Aad Nales [mailto:[EMAIL PROTECTED] Sent: Monday, October 18, 2004 11:59 AM To: [EMAIL PROTECTED] Subject: n-gram indexing for generating spell suggestions ... 2. often used misspelings in Dutch words between 4 and 5 characters were missed. E.g. 'fiets' was suggested as a possible spell suggestion for 'feits' since no matching 3gram exist between the two. The same held true for misspellings based on 'ch' and 'g' both being the same sound in Dutch but written differently. 3. words that could never be part of a suggestion were added based on a single matchting n-gram. (e.g. if I ask for suggestions on 'per' then tupperware is also suggested. But solely based on length it is clear that it has a minimal distance of 7. ... --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]