eks dev wrote:
Hi Doug,
Perhaps. Are folks really better at spelling the beginning of words?
Yes they are. There were some comprehensive empirical studies on this topic. Winkler modification on Jaro string distance is based on this assumption (boosting similarity if first n, I think 4, chars match). Jaro-Winkler is well documented and some folks thinks that it is much more efficient and precise than plain Edit distance (of course for normal language, not numbers or so). I will try to dig-out some references from my disk on
Good ole Citeseer finds 2 docs that seem relevant:
http://citeseer.ist.psu.edu/cs?cs=1&q=Winkler+Jaro&submit=Documents&co=Citations&cm=50&cf=Any&ao=Citations&am=20&af=Any
I have some of the ngram spelling suggestion stuff, based on earlier msgs in this thread, working in my dev tree. I'll try to get a test site up later today for people to fool around with.
this topic, if you are interested.
On another note, I would even suggest using Jaro-Winkler distance as default for fuzzy query. (one could configure max prefix required => prefix query to reduce number of distance calculations). This could speed-up fuzzy search dramatically.
Hope this was helpful, Eks
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
