Hi! I have a contact application where I need to display possible duplicates within the existing contacts. Possible duplicates means different contact entries that refer to the same person and might have the same or slightly different information (typos).
What I currently do is search for different levels of duplication (it's a single union of 3 queries): - the first query searches for exact duplicates (exactly the same name, address, email, phone, etc); - second query searches for matches using the soundex algorithm on a restricted set of fields and is given a lower matching score; - third query applies soundex on more fields and is given an even lower matching score. Is there a better algorithm or way to do this fuzzy duplication search over multiple fields (firstname, lastname, address, etc) ? Pointers to wikipedia, books, etc appreciated. -- Mack ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:338354 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm