Sadly I don;t think soundex is going to help you as this finds words that sound like other words (there, their, they're), which isn't going to pickup typos. I think comparing each contact field on an OR basis is a sufficient way to find dupes, if none of those fields are the same then it is not really a duplicate. Even if you have a typo in the same, the other fields are going to have a match surely.
Russ On Wed, Oct 20, 2010 at 12:53 PM, Mack <mrsmith.w...@gmail.com> wrote: > > Hi! > > I have a contact application where I need to display possible > duplicates within the existing contacts. Possible duplicates means > different contact entries that refer to the same person and might have > the same or slightly different information (typos). > > What I currently do is search for different levels of duplication > (it's a single union of 3 queries): > - the first query searches for exact duplicates (exactly the same > name, address, email, phone, etc); > - second query searches for matches using the soundex algorithm on a > restricted set of fields and is given a lower matching score; > - third query applies soundex on more fields and is given an even > lower matching score. > > Is there a better algorithm or way to do this fuzzy duplication search > over multiple fields (firstname, lastname, address, etc) ? Pointers to > wikipedia, books, etc appreciated. > > -- > Mack > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:338356 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm