I'm not sure how well it works with PostgreSQL, but for Levenshtein, you
can screen out strings that are longer/shorter than your starting string by
greater than your target threshold. So, if you have a 7 character string
and want an edit distance of 2, your candidate pool can be limited to
strings with lengths 5-9. (Off the top of my head.) If the PostgreSQL
character_length() function is fast enough, this can help. For bigram
comparisons, you could potentially apply something of a similar pre-slicing.

Not sure if this pays off in your case, but if you can do fast filters
before applying the more intensive coefficient calculation, maybe it would
be of some help?
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Reply via email to