Sadly I don;t think soundex is going to help you as this finds words that
sound like other words (there, their, they're), which isn't going to pickup
typos.
I think comparing each contact field on an OR basis is a sufficient way to
find dupes, if none of those fields are the same then it is not really a
duplicate. Even if you have a typo in the same, the other fields are going
to have a match surely.

Russ


On Wed, Oct 20, 2010 at 12:53 PM, Mack <mrsmith.w...@gmail.com> wrote:

>
> Hi!
>
> I have a contact application where I need to display possible
> duplicates within the existing contacts. Possible duplicates means
> different contact entries that refer to the same person and might have
> the same or slightly different information (typos).
>
> What I currently do is search for different levels of duplication
> (it's a single union of 3 queries):
> - the first query searches for exact duplicates (exactly the same
> name, address, email, phone, etc);
> - second query searches for matches using the soundex algorithm on a
> restricted set of fields and is given a lower matching score;
> - third query applies soundex on more fields and is given an even
> lower matching score.
>
> Is there a better algorithm or way to do this fuzzy duplication search
> over multiple fields (firstname, lastname, address, etc) ? Pointers to
> wikipedia, books, etc appreciated.
>
> --
> Mack
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:338356
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

Reply via email to